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20.y^BSTRACT  (continued) 

complex  variable  problem  resulting  from  these  frequency  domain 
considerations.  Using  the  results  of  the  minimization,  a special 
recursive  algorithm  used  to  obtain  the  matrix  filter  coefficients 
used  in  the  adaptive  processor  is  derived  from  the  theory  of 
stochastic  approximation. 

The  special  recursive  stochastic  algorithm  is  shown  to  be  a 
frequency  domain  multi-input,  multi-output  adaptive  realization  of 
the  Wiener  filter,  and  has  as  its  goals,  operating  as  an  adaptive 
processor,  the  ability  to  gain  fast  increase  in  output  signal- to- 
nolse  (S/N)  and  yet  maintain  statistical  smoothing  characteristics 
necessary  for  practical  real  time  use  of  an  adaptive  processor.^. 
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Abstract 

The  research  described  here  is  concerned  with  developing  a system  for 
adaptively  processing  the  outputs  of  an  array  of  sensing  elements  subject 
to  certain  optimality  and  processing  time  constraints.  Before  the  sensor 
outputs  are  passed  to  the  adaptive  processor,  the  sensing  elements  are 
combined  to  form  a set  of  fixed  beams.  Each  of  the  preformed  beams  is 
pointed  in  a different  desired  look  direction,  and  it  is  on  these  preformed 
beams,  after  spectral  analysis  by  the  Fast  Fourier  Transform  (FFT) , that 
the  adaptive  part  of  the  processor  will  operate.  The  adaptive  part  of  the 
processor  is  designed  to  be  a stochastic  adaptive  filter  to  be  able  to 
handle  the  inputs  to  the  processor  which  are  random  processes.  The  processor 
is  also  recursive  so  as  to  be  easily  updated  as  the  environment  changes. 

In  the  development  of  the  theory  of  the  stochastic  adaptive  filter,  the 
orthogonal  projection  lemma  is  used  to  perform  the  minimization  of  the  complex 
variable  problem  resulting  from  these  frequency  domain  considerations.  Using 
the  results  of  the  minimization,  a special  recursive  algorithm,  used  to  obtain 
the  matrix  filter  coefficients  used  in  the  adaptive  processor,  is  derived  from 
the  theory  of  stochastic  approximation. 

From  the  convergence  proofs  of  the  stochastic  adaptive  algorithm  comes 
the  fact  that  the  sequence  that  describes  the  recursive  filter  is  a martingale. 

This  fact  is  of  general  use  because  it  cannot  only  be  used  to  prove  convergence 
of  a recursive  stochastic  algorithm,  but  also  to  show  stability  in  the  sense 
of  a stochastic  control  system. 

The  special  recursive  stochastic  algorithm  is  shown  to  be  a frequency 
domain  multi-input,  multi-output  adaptive  realization  of  the  Wiener  filter, 
and  has  as  its  goals,  operating  as  an  adaptive  processor,  the  ability  to  gain 
fast  Increase  in  output  signal-to-nolse  (S/N)  and  yet  maintain  statistical 
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CHAPTER  I 

INTRODUCTION  AND  PROBLEM  STATEMENT 
1.0  Introduction 

In  diverse  fields  such  as  communication  systems  (Allen,  2;  Mermoz, 
87;  Muellar,  89),  and  seismic  processing  systems  (Burg,  22),  arrays  of 
sensors  (antennas,  seismometers)  are  used  to  form  beam  patterns  and  to 
use  these  beam  outputs  as  inputs  to  a processor  which,  in  general, 
performs  further  signal  processing.  The  processing  is  designed  to  be 
adaptive  in  this  research.  Each  input  to  the  processor  is  a stochastic 
process  consisting  of  a desired  signal  (s^)  plus  interfering  noise 
(n^)  plus  independent  noise  (sn^)  (Figure  1) . 

The  research  described  here  is  concerned  with  developing  a system 
for  adaptively  processing  the  outputs  of  an  array  of  sensing  elements 
subject  to  certain  optimality  (mean  square  error)  and  processing  time 
constraints.  The  adaptive  part  of  the  processor  is  designed  to  be  a 
stochastic  adaptive  filter  which  must  efficiently  filter  inputs  which 
are  complex  random  processes.  The  processor  is  also  recursive  so  as  to 
be  easily  updated  as  the  environment  changes. 

The  adaptive  processing  system  does  not  have  to  be  tied  to  any 
type  of  array  processing  system  but  can  be  applied  to  any  root  finding 
problem  that  can  be  put  into  the  form  of  the  stochastic  approximation 
algorithm.  As  soon  as  the  stochastic  adaptive  processor  is  put  into  the 
framework  of  stochastic  approximation,  it  will  become  clear  that  the 
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present  algorithm  has  applicability  to  areas  such  as  data  communications 
(Muellar  and  Spaulding,  90),  radio  astronomy  (Sakrison,  109),  conttol 
system  identification  (Chien  and  Fu,  24)  and  biology  (Cochran  and  Davis, 
26). 

In  the  development  of  the  theory  of  the  stochastic  adaptive 
filter,  the  orthogonal  projection  lemma  is  used  to  perform  the  minimiza- 
tion of  the  complex  variable  problem  resulting  from  the  input  signal 
considerations.  Using  the  results  of  the  minimization,  a special 
recursive  algorithm  is  derived  from  the  theory  of  stochastic  approxima- 
tion. This  recursive  algorithm  is  used  to  obtain  the  matrix  filter 
coefficients  for  the  adaptive  processor. 

The  special  recursive  stochastic  algorithm  is  shown  to  be  a 
frequency  domain  mulit-lnput,  multi-output  adaptive  realization  of  the 
Wiener  filter,  and  has  as  its  goals,  operating  as  an  adaptive  processor , 
the  ability  to  gain  fast  increase  in  output  signal-to-noise  ratio  (S/N) 
and  yet  maintain  statistical  smoothing  characteristics  necessary  for 
practical  real  time  use  of  an  adaptive  processor. 

The  key  phrase  in  the  above  expression  is  that  the  adaptive 
processor  is  a specially  derived  stochastic  adaptive  processor.  The 
stochastic  nature  of  the  underlying  processes  takes  the  central  role  m 
the  derivation  of  the  stochastic  adaptive  filter. 

The  concept  used  in  this  research  can  be  viewed  as  stochastic 
decoupled  data  processor.  The  complete  data  processing  system  can  be 
viewed  as  an  adaptive  system  which  has  N inputs  and  decouples  these 
inputs  by  putting  them  through  an  N input,  N output  linear  system 
(Figure  2).  This  linear  system  contains  an  adaptive  filter  derived  from 
a recursive  stochastic  approximation  algorithm.  The  linear  system  is 


N input,  N output 
LINEAR  SYSTEM 


Figure  2.  Stochastic  Adaptive  System 
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adaptive  and  it  decouples  input  signals  in  the  statistical  sense  of 
decorrelation  by  removing  the  unwanted  effect  of  interference  signals 
from  one  input  to  another. 

Although  no  effort  has  been  made  to  bridge  the  gap  between  the 
present  algorithm  and  decoupling  (Falb  and  Wolovich,  43)  in  multivari- 
able control  systems,  there  are  many  similarities.  In  decoupling  of 
multivariable  control  systems,  it  is  desired  to  have  inputs  control 
outputs  independently,  i.e.,  a single  input  influences  a single  output. 
This  is,  in  essence,  the  goal  of  the  adaptive  processor.  Even  though 
the  mathematical  techniques  involved  in  the  two  cases  are  different, 
both  methods  obtain  a so-called  decoupling  matrix  (or  filter)  which 
performs  the  stated  purpose. 

Schweppe  (114)  considered  the  maximum  likelihood  solution  for  a 
class  of  adaptive  processors  where  he  wished  to  decouple  J signals 
from  K sensors  by  forming  J independent  beams.  This  idea  is  just 
an  application  of  the  general  ideas  of  statistical  estimation  and 
detection  to  the  array  processing  problem  and  was  not  designed  to  be  an 
adaptive  processor  in  the  same  sense  as  is  used  in  this  research.  It 
does,  however,  contain  the  first  reference  to  a so-called  decoupled 
beam  data  processor. 

A knowledge  of  the  statistics  of  the  signal  field  as  well  as  the 
noise  field  is  necessary  to  describe  system  performance  in  statistical 
detection  theory  (Van  Trees,  125;  Whalen,  132).  The  adaptive  processor 
in  this  research  assumes  certain  knowledge  of  the  statistics  of  the 
input  signals  and  noises.  The  basic  fact  assumed  is  that  the  signal  in 
a beam  is  uncorrelated  with  all  other  sources  in  a beam.  Since  optimum 
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detection  systems  require  complete  statistical  knowledge,  the  adaptive 
processor  becomes  attractive  in  situations  where  complete  information 
is  not  available. 

It  is  assumed  in  this  analysis  that  all  signals  and  noises  are 
at  least  wide  sense  stationary,  and  no  particular  statistical  distribu- 
tions are  assumed  for  the  signals  and  noises  in  the  derivation  of  the 
stochastic  adaptive  filter. 

The  succeeding  section  of  this  chapter  contains  a formal  state- 
ment of  the  problem  solved  and  a brief  outline  of  the  properties  and 
innovative  qualities  of  the  proposed  stochastic  adaptive  filter.  This 
section  provides  some  background  necessary  to  understand  the  derivation 
of  the  adaptive  matrix  filter  from  the  use  of  the  recursive  algorithm 
which  is  derived  from  stochastic  approximation  techniques  and  its 
position  in  the  literature  of  control  systems,  adaptive  arrays  and 
stochastic  approximation. 

1.1  Problem  Statement 

The  problem  addressed  in  this  research  is  to  derive  a complex 
stochastic  adaptive  matrix  filter  as  an  approximation  to  the  multi- 
input,  multi-output  Wiener  filter.  The  adaptive  filter  operates  on  the 
complex  stochastic  inputs.  The  recursive  algorithm  used  to  derive  the 
adaptive  matrix  filter  uses  a special  purpose  stochastic  approximation 
technique.  This  special  purpose  stochastic  approximation  technique 
should  have  the  properties  of  fast  improvement  in  output  S/N 
irrespective  of  noise  statistics  and  statistical  smoothing  of  the  matrix 
adaptive  filter  coefficients.  The  recursive  algorithm  and  the  resultant 
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stochastic  adaptive  matrix  filter  has  to  be  stable,  and  ought  to  be 
conceptually  simple,  and  easily  Implemented  with  little  computer 
storage  in  a real  time  environment. 

The  following  is  a summary  of  the  features,  innovations,  mathe- 
matical techniques  used  to  derive  the  stochastic  adaptive  algorithm, 
and  the  differences  between  the  stochastic  adaptive  algorithm  and 
previous  deterministic  adaptive  algorithms.  This  summary  lists  the 
characteristics  of  the  stochastic  adaptive  algorithm  and  acts  as  a 
prelude  to  a more  detailed  analysis  which  gives  the  important  techniques 
and  properties  of  the  stochastic  adaptive  algorithm. 

1.  In  previous  work,  most  adaptive  filters  were  synthesized  by 
using  time  domain  concepts  such  as  correlations  and  time  delays  in  some 
form  of  tapped  delay  line  filter.  In  this  research,  the  adaptive  filter 
is  implemented  by  using  frequency  domain  concepts  such  as  cross-spectral 
densities  and  multiplication  of  complex  quantities. 

All  previous  attempts  at  adaptive  processors  have  been  either 
single  input, single  output,  or  multi-input,  single  output  systems.  The 
present  system  is  a first  attempt  at  an  adaptive  complex  multi-input, 
multi-output  adaptive  processor. 

2.  Since  the  stochastic  adaptive  algorithm  is  a frequency  domain 
implementation  of  an  approximation  to  a Wiener  filter,  complex  variables 
t.rise  naturally  in  the  derivation  of  the  optimum  solution,  and  a special 
technique  called  the  orthogonal  projection  lemma  is  used  to  derive  the 
optimum  filter. 

3.  Both  the  stochastic  nature  of  the  random  processes  involved 
and  the  complex  variables  which  arise  from  the  complex  input  signals 
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Mke  derivation  and  implementation  of  adaptive  algorithms  difficult.  It 
la  shown,  however,  that  any  adaptive  algorithm  that  does  not  take  the 
stochastic  nature  of  the  underlying  signals  Into  account  Is  not  as 
useful  as  stochastic  algorithms  In  situations  where  the  underlying 
signals  are  random  processes. 

The  use  of  a specially  derived  stochastic  approximation  procedure 
enables  the  system  designer  to  obtain  fast  Increase  in  output  S/N  and 
yet  maintain  the  advantageous  statistical  smoothing  properties  of  the 
stochastic  adaptive  algorithm  not  available  with  any  deterministic 
gradient  algorithm. 

4.  From  the  convergence  proofs  of  the  stochastic  adaptive 
algorithm  comes  the  fact  that  the  sequence  that  describes  the  recursive 
algorithm  Is  a martingale.  This  fact  is  of  general  use  because  it 
cannot  only  be  used  to  prove  convergence  of  a recursive  stochastic 
algorithm,  but  also  to  show  stability  In  the  sense  of  a stochastic 
control  system. 

The  following  paragraphs  contain  a more  detailed  look  at  the 
possibilities  of  the  adaptive  algorithm,  the  mathematical  techniques 
used  In  the  derivation  of  the  complex  stochastic  adaptive  system  and  its 
potential  advantages  and  Innovations  over  deterministic  gradient 
procedures . 

Since  the  present  system  Involves  complex  performance  measures 
which  do  not  always.  In  analogy  with  their  real  variable  counterparts, 
have  uniquely  defined  derivatives,  new  methods  for  minimization  of  a 
performance  functional  are  necessary.  In  most  minimization  problems 
Involving  real  variables,  there  is  usually  no  problem  In  taking 
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derivatives  of  the  performance  functional.  However,  in  the  complex 
variable  case,  even  the  simple  performance  measure,  mean  square  error, 
does  not  have  a uniquely  defined  derivative.  To  use  techniques  involving 
complex  variables  which  do  not  have  a derivative,  some  other  method  for 
finding  the  optimum  solution  must  be  found. 

For  the  complex  case  of  the  recursive  algorithm,  the  non- 
analyticity  of  the  performance  functional  makes  it  impossible  to  talk  of 
differentiation  in  the  usual  sense.  The  technique  of  orthogonal  projec- 
tion (Halmos,  53), which  does  not  depend  on  differentiation,  can  be  used 
to  find  the  optimum  solution.  This  mathematical  procedure  is  very 
general  and  can  be  applied  to  a wide  variety  of  situations.  Kalman  (65), 
in  one  of  the  orthogonal  projection  lemmas  more  famous  applications  in 
the  engineering  literature,  applied  the  technique  to  derive  the  minimum 
variance  unbiased  state  estimator  which  bears  his  name.  The  orthogonal 
projection  lemma  can  be  applied  to  situations  where  conventional 
derivatives  cannot  be  used,  and  it  is  used  here  to  derive  the  optimum 
complex  frequency  domain  solution  to  the  minimization  problem. 

Since  all  the  signals  involved  in  the  use  of  the  adaptive  algorithm 
are  stochastic  processes,  the  use  of  an  adaptive  algorithm  that  takes  the 
stochastic  nature  of  the  random  processes  involved  into  account  is 
paramount  to  the  success  of  any  adaptive  algorithm. 

A special  purpose  stochastic  approximation  method  is  derived  to 
calculate  the  optimum  filter  by  use  of  a stochastic  recursive  algorithm. 
The  mi8adjustment  or  variance  of  the  filter  weights  for  any  deterministic 
algorithm  can  be  shown  to  be  inversely  proportional  to  the  speed  of  the 
algorithm.  Thus,  for  a fast  increase  in  output  S/N,  there  is  a large 
misadjustment  in  the  filter  weights.  The  attainment  of  both  small 
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variance  in  the  filter  weights  and  fast  convergence  (fast  increase  in 
S/N)  is  shown  to  be  impossible  in  any  algorithm  which  does  not  take 
into  account  the  stochastic  nature  of  the  signals  involved.  It  is 
shown  that  all  the  gradient  type  algorithms  derived  from  the  mean  square 
error  functional  are  just  approximations  to  stochastic  approximation, 
and  since  the  processes  Involved  in  the  adaptation  are  stochastic 
processes,  stochastic  approximation  is  a much  more  logical  choice  for 
the  adaptive  algorithm  than  any  deterministic  gradient  type  algorithm. 

The  recursive  stochastic  adaptive  algorithm  used  to  calculate 
the  adaptive  filter  uses  a special  infinite  sequence  for  the  variable 
gain,  . This  special  variable  gain  is  derived  from  convergence  and 
stability  considerations  of  the  idealized  form  of  the  recursive  adaptive 
algorithm.  This  variable  gain  allows  the  stochastic  adaptive  algorithm 
to  operate  near  the  stability  boundary  of  the  recursive  algorithm.  By 
operating  at  the  stability  boundary,  the  adaptive  algorithm  has  fast 
initial  convergence,  and  yet  maintains  the  smoothing  properties  one 
obtains  from  using  a stochastic  approximation  method.  It  would  be 
impossible  to  operate  a deterministic  algorithm  at  the  stability  limit 
of  the  algorithm  because  the  deterministic  algorithms  possess  no 
smoothing  properties,  and  the  variance  of  the  calculated  filter  matrix 
would  be  extremely  high  and  unusable  in  a practical  system. 

Due  to  properties  inherent  in  the  recursive  algorithm  of 
stochastic  approximation,  the  method  usually  exhibits  slow  long  term 
convergence.  This  attribute  is  mainly  due  to  the  time  variable  gain 
sequence.  The  real  advantage  of  using  the  special  time  variable  gain 
sequence  derived  from  stability  considerations  of  the  idealized 
algorithm  is  that  one  obtains  the  maximum  convergence  rate  obtainable 
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within  the  stability  limits  and  yet  maintain  small  misadjustment  in  the 
filter  coefficients. 

In  many  applications,  the  practical  usefulness  of  adaptive  arrays 
is  limited  by  their  convergence  rate.  The  adaptively  controlled  filter 
coefficients  must  change  at  a rate  equal  to  or  greater  than  the  rate  of 
change  of  the  external  random  inputs.  The  convergence  rate  is  most 
severe  in  systems  where  the  eigenvalues  of  the  input  covariance  matrix 
of  beams  differ  by  several  orders  of  magnitude.  The  method  proposed  in 
this  research  makes  use  of  the  distribution  of  eigenvalues  to  obtain 
fast  convergence  no  matter  how  widely  separated  the  eigenvalues  are. 

No  deterministic  adaptive  system  (Including  matrix  inversion)  can  obtain 
as  fast  convergence  properties  without  suffering  the  undesirable  result 
of  extremely  large  filter  coefficient  variance. 

Certain  conditions  are  formulated  which  guarantee  convergence  of 
any  recursive  algorithm  using  stochastic  approximation.  From  the  proof 
of  convergence  of  the  form  of  recursive  algorithm  used  in  this  research 
comes  the  fact  that  the  sequence  of  solutions  of  the  recursive  algorithm 
is  shown  to  be  a martingale.  This  fact  is  not  only  used  in  the  conver- 
gence proof  of  the  stochastic  approximation  algorithm,  but  also  to  show 
that  the  adaptive  algorithm,  when  formulated  in  the  context  of  a state 
variable  control  system,  is  stable  in  terms  of  stochastic  stability. 

The  convergence  of  the  stochastic  adaptive  algorithm  is  equiva- 
lent to  the  stability  of  systems  described  by  stochastic  difference 
equations.  The  stability  of  these  type  of  systems  must  be  considered 
in  a probabilistic  sense  so  that  stochastic  analogs  of  Lyapunov 
functions  can  be  defined.  It  is  shown  that  the  stochastic  Lyapunov 
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functions  are  martingales,  and  all  the  useful  properties  of  martingales 
(see  Appendix  C)  accrue  to  these  stochastic  Lyapunov  functions. 

The  purpose  of  an  adaptive  processor  is  that  the  system  somehow 
adapts  to  changes  in  the  environment  and  removes  the  effect  of  inter- 
fering noises  from  the  output  of  the  processor. 

An  intuitive  meaning  to  the  problem  of  removing  any  interfering 
signals  from  a given  input  can  be  gleaned  from  the  meaning  of  diagonal- 
ization  of  the  power  spectral  density  matrix  of  beam  outputs.  When  the 
power  spectral  density  matrix  of  beam  outputs  is  diagonalized,  then  the 
diagonalized  matrix  represents  the  covariance  matrix  of  inputs  in  terms 
of  a set  of  orthogonal  vectors.  In  the  statistical  sense,  this 
diagonalization  decorrelates  the  beam  outputs  and  makes  the  beams 
independent  in  the  Gaussian  case  (Van  Trees,  125). 

The  adaptive  processing  system  is  designed  to  extract  a particular 
desired  signal  from  a given  input  in  the  presence  of  many  other  signals 
which  are  considered  to  be  interfering  noises.  We  shall  address  the 
difficult  problem  of  acquiring  a weak  desired  signal  in  the  presence  of 
strong  interference  by  the  use  of  adaptive  arrays. 

1.2  Outline 

Since  the  background,  derivation  and  use  of  the  stochastic 
adaptive  processor  uses  concepts  from  a wide  range  of  mathematical  and 
engineering  areas,  a brief  introduction  to  each  of  the  major  concepts 
is  given  in  Chapter  II.  An  introduction  to  array  theory  is  presented 
in  Section  2.1.  This  is  presented  as  background  to  the  particular  type 
of  problem  solved  in  the  experimental  tests.  A brief  summary  of  linear 
system  theory,  especially  the  topic  of  Wiener  filtering  and  the 
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approximations  to  the  Wiener  filter,  is  given  in  Section  2.3.  The 
concepts  of  linear  systems  and  Wiener  filtering  are  needed  to  understand 
how  the  present  adaptive  algorithm  fits  into  the  framework  of  a detec- 
tion system.  Concepts  needed  from  state  variable  control  systems  and 
probability  and  statistics  as  applied  to  stochastic  processes  appear  in 
the  appendices.  Concepts  from  digital  signal  processing  are  used 
liberally  in  the  adaptive  processor  solution  and  application. 

Since  the  tapped  delay  line  filters  are  by  far  the  most  common 
time  domain  implementation  of  the  Wiener  filter,  a description  of  its 
derivation  and  the  assumptions  inherent  in  it  are  given  in  Section  2.3. 
Various  adaptive  algorithms  considered  in  this  research  are  documented 
in  Section  2.4.  This  section  gives  insight  as  to  how  the  present 
adaptive  algorithm  differs  from  previous  ones. 

Chapter  III  contains  the  probabilistic  techniques  necessary  for 
the  derivation  and  use  of  the  stochastic  adaptive  filter.  This  chapter 
contains  a history  of  the  technique  called  stochastic  approximation. 

The  proofs  deriving  the  sufficient  conditions  for  minimum  mean  square 
error,  and  probabilistic  convergence  criterion  are  given  in  Section  3.2. 
The  interpretation  of  mean  square  estimation  is  given  in  this  section 
Section  3.2  also  contains  the  derivation  for  the  regression  function 
to  be  used  in  the  recursive  algorithm  used  to  calculate  the  adaptive 
filter.  The  convergence  proof  of  the  stochastic  approximation  algorithm 
in  Section  3.3  establishes  the  important  result  that  the  sequence  of 
solutions  of  the  adaptive  filter  is  a martingale.  The  last  section  in 
this  chapter  contains  the  statistical  and  geometrical  significance  of 
the  results  of  the  previous  sections. 
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Chapter  IV  contains  the  derivation  and  proof  of  convergence  of 
the  stochastic  adaptive  filter.  Use  is  made  of  the  orthogonal  projec- 
tion lemma  to  derive  the  optimum  solution  to  the  minimum  mean  square 
error  problem.  Section  4.2  establishes  the  meaning  of  the  decoupling 
concept  as  it  applies  to  the  adaptive  filter. 

The  dynamic  properties  of  the  stochastic  adaptive  algorithm  are 
derived  in  Chapter  V.  The  derivation  of  the  optimum  gain  sequence  from 
considerations  of  the  idealized  form  of  the  adaptive  algorithm  and  the 
extended  convergence  conditions  are  contained  in  Section  5.1.  Section 
5.2  gives  the  stochastic  stability  considerations  and  shows  that  the 
stochastic  Lyapunov  functions  are  martingales.  The  last  section  unifies 
ell  LMS  type  algorithms  by  using  the  framework  of  stochastic  approxima- 
tion but  points  out  the  fallacy  of  using  deterministic  algorithms  (LMS 
type,  etc.)  when  the  processes  involved  in  the  adaptive  processor  are 
stochastic  processes. 

Chapter  VI  contains  the  construction  of  the  computer  simulation 
and  lists  the  generic  cases  tested.  The  Influence  of  the  gain  sequence 
and  the  optimum  gain  constant  on  convergence  rates  is  illustrated  by 
the  computer  simulation  results.  The  physical  reasoning  and  justifica- 
tion for  using  Implied  constraints  in  some  experimental  tests  and  the 
fact  that  these  cases  do  not  differ  from  the  cases  with  no  constraints 
for  all  but  the  long  term  operation  of  the  recursive  algorithm  is  given 
in  Sections  6.2  and  6.3.  The  influence  of  a search  strategy  on  long 
term  convergence  and  a special  stopping  rule  are  discussed  in  Section 
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The  last  chapter  contains  a summary  of  the  work  completed  and  the 
conclusions  reached.  It  also  contains  proposals  for  future  work  and 
extensions  into  other  areas. 


CHAPTER  II 


BACKGROUND  THEORY 


2.0  Introduction 

The  present  chapter  gives  the  basic  background  for  the  detection 
and  estimation  problem,  shows  the  connection  of  Wiener  and  Kalman 
filtering  to  the  detection  and  estimation  problem  and  shows  how  the 
present  stochastic  adaptive  processor  approximates  the  Wiener  filter 
solution.  Even  though  any  particular  type  or  special  characteristic 
of  an  array  of  sensing  elements  is  not  necessary  for  the  derivation  and 
use  of  the  stochastic  adaptive  algorithm,  a brief  summary  of  the 
properties  of  arrays  is  included.  This  summary  of  array  theory  gives  a 
firm  basis  to  understand  the  physical  considerations  involved  in  the 
experimental  verification  of  the  stochastic  adaptive  algorithm 

The  following  sections  give  the  background  of  general  detection 
and  estimation  systems  and  show  the  use  of  various  previous  optimum  and 
adaptive  methods  to  increase  detection  system  performance.  They  show 
the  complexity  of  true  optimum  systems  and  that,  without  some  simplify- 
ing considerations,  the  implementation  of  the  true  detection  or 
estimation  systems  in  real  time  would  not  be  feasible. 

We  Illustrate  the  derivations  for  the  optimum  filters  for  both 
stationary  (Wiener  filter)  environments  and  non-stat ionary  (Kalman 
filter)  environments  and  show  various  ways  to  implement  a multichannel 


Wiener  filter. 
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Almost  all  previous  attempts  at  adaptive  processors  have  been 
some  form  of  time  domain  tapped  delay  line  filter  using  deterministic 
minimization  procedures.  The  tapped  delay  line  filter  is  derived  and 
its  Implementation  is  discussed.  Since  this  filter  structure  is  by  far 
the  most  used  time  domain  approximation  to  the  Wiener  filter,  it  is 
important  to  discuss  its  characteristics  and  limitations.  The  distinc- 
tion between  this  time  domain  approach  j*nd  the  frequency  domain 
approach  taken  here  is  important  in  understanding  the  differences  and 
advantages  of  the  frequency  domain  stochastic  adaptive  filter  proposed 
in  this  research. 

The  last  section  delineates  the  various  approximations  and 
adaptive  system  approaches  to  the  Wiener  filter  problem,  and  shows  the 
various  assumptions  adaptive  system  designers  have  used  to  obtain 
realistic  approximations  to  the  optimum  Wiener  filter.  It  contains  a 
summary  of  some  of  the  most  important  contributions  to  the  adaptive 
array  literature  and  a brief  discussion  of  the  differences  of  the 
various  adaptive  techniques  attempted. 


2.1  Array  Theory 

Using  results  contained  in  Allen  (2) , we  can  make  clear  the  use 
of  an  array  of  sensing  elements  in  the  recursive  stochastic  algorithm. 
Let  us  assume  that  there  are  N isotropic  sensors  at  arbitrary  posi- 
tions specified  by  a set  of  vectors,  v (Figure  3).  The  contribution 

— n 

of  the  ntl*  element  to  the  far  field  at  some  point  at  a distance  R 

is 
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Figure  3.  Array  Geometry 
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th 

where  (v  • eO  is  the  perpendicular  distance  of  the  n element 
11  K 

from  a plane  through  the  origin  perpendicular  to  the  direction  e , 

K 

X is  the  wavelength,  and  an  represents  the  amplitude  response  of  the 

nth  element.  The  product  k(v  • e_)  gives  the  phase  shift  of  the 

- n — K 

signal  (at  distance  R)  relative  to  the  reference  point  of  the  array. 

The  total  expression  for  the  field  strength  for  the  far  field 
pattern  is  the  sum  of  the  contributions  from  all  of  the  elements: 


f(e„>  - “l1  a eJk<Vn  ' V 
k A n 

n*0 


(2  2) 


If  we  consider  the  simple  case  of  an  array  of  sensors  equally 
spaced  along  a line,  then  Equation  (2.2)  becomes: 


N-l 

f(a)  - I a e 
n-0  n 


jk  n D sin  a 


(2.3) 


where  k(v^  • e^  becomes  k(n  D sin  a)  , the  phase  shift  of  the  n 
element,  a is  the  complement  of  the  spherical  angle  6 , and  D is 
the  spacing  between  elements. 

We  find  that  the  equation  of  normalized  magnitude  of  Equation 

(2.3)  with  the  amplitude  factors  a >1  is  of  the  form: 

n 


th 


sin  N x 
N sin  x * 


(2.4) 


where  x • (^)  sin  a . 

The  maximum  of  Equation  (2.4)  occurs  at  the  origin  and  is 
referred  to  as  a main  lobe.  The  secondary  maxima  of  Equation  (2.4)  are 
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called  side  lobes.  The  number  of  elements,  the  size  of  the  aperature 
and  the  amplitude  Illumination  factors  a^  determine  the  characteris- 
tics of  the  array.  One  use  of  the  proposed  stochastic  adaptive 
algorithm  would  be  to  allow  relatively  high  side  lobes  In  the  pattern 
thereby  reducing  the  cost  and  complexity  of  the  array. 

One  of  the  attractive  features  of  the  array  configuration  is 
that  the  main  beam  of  the  array  can  be  pointed  without  moving  the  array 
Itself  (Allen,  2).  If  we  desire  to  point  the  main  beam  at  an  angle  a ^ 
from  the  perpendicular  to  the  array,  then  Equation  (2.3)  becomes 


f (a,aQ) 


N-l 

l 

n-0 


jkn  D[sin  a - sin  a J 
n 


(2.5) 


and  the  main  beam  is  translated  from  the  point  a **  0 to  the  point 

a - a . 
o 

If  we  have  the  location  of  point  sources  as  in  Figure  A,  then 
the  array  output  would  be  the  sum  of  the  response  of  the  array  to  each 
source.  For  any  given  main  beam,  only  one  source  could  be  steered  on 
and  the  other  sources  would  enter  the  beam  as  extraneous  signals  or 
noises. 


Mathematically  this  would  be 


f TOTAL  BEAM  1 


Array  output  for  beam  1 
C1,181  + C1,2S2  + Cl,383  + * 


+ C,  8 

l,n  n 


(2.6) 


where  c.  , represents  the  proportion  of  source  s that  appears  in 
1»J  J 

beam  1;  c is  always  taken  to  be  one  because  it  is  the  constant 
n,n 
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Figure  A.  Field  Strength  at  Different  Angles 
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representing  the  position  of  the  signal  s in  the  beam  b and  this 

n n 

signal  is  considered  the  signal  steered  on  in  that  beam  and  not  a noise 
signal. 

The  output  of  the  array  comes  not  only  from  the  signal  that  the 
particular  beam  is  steered  on,  i.e  the  signal  direction  is  known  in 
each  beam,  but  also  from  any  signals  in  the  far  field  of  that  particular 
beam.  However,  since  all  other  signals  except  the  one  steered  on  are 
not  at  the  center  of  the  main  lobe,  they  contain  less  power  in  this 
beam  than  in  the  beam  which  steers  on  them.  The  same  can  be  said  of 
any  steered  beam.  We  can  summarize  by  stating  that  the  output  power  of 
the  processor  due  to  a directional  noise  depends  on  the  direction  of 
propagation  of  the  noise  and  its  power  spectrum,  as  well  as  the  geometry 
of  the  array  and  each  beams  particular  steering  direction. 

We  call  a process  with  certain  spatial  characteristics  a signal 
if  it  impinges  on  the  sensors  in  a main  beam,  but  we  call  the  same 
spatial  process  a noise  in  any  other  beam.  In  this  research,  it  is 
assumed  that  there  are  multiple  main  beams.  Other  spatial  processes 
such  as  self-noise  and  environment  noise  are  considered  noises  in  all 
beams . 

We  would  like  to  design  a processor  which  would  detect  signals 
anywhere  inside  the  3 dB  points  of  the  main  beam  but  reject  anything 
outside  these  points  (Figure  4) . In  designing  the  beam  outputs  of  an 
array,  the  designer  is  limited  by  the  size  (aperature)  of  the  array  and 
the  number  of  sensors.  These  considerations  place  limits  on  the  width 
of  the  main  beam  and  the  amplitude  of  the  side  lobes.  The  purpose  of 
any  processing  system  is  to  improve  the  detection  system  performance 
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using  a given  array.  With  the  stochastic  adaptive  technique  being 
proposed,  the  side  lobe  level  of  a beam  can  be  relatively  high  as  long 
as  the  directionality  of  the  main  beams  is  maintained  and  the  adaptive 
processor  will  reject  unwanted  signals  from  the  main  beam  (outside  the 
3 dB  points)  and  the  side  lobes  so  as  to  improve  the  systems  detection 
performance.  These  facts  reduce  the  constraints  on  the*array  designer 
and  reduce  the  complexity  of  the  array  design  while  imjeving  detection 
system  performance  by  reducing  the  false  alarm  rate.  S 

2.2  Detection  Theory  * 

It  is  a fact  from  statistical  detection  theory  that  both  the 
Neyman-Pearson  and  Bayes  detection  criteria  lead  to  a likelihood-ratio 
detector  (Van  Trees,  125).  The  objective  of  any  filter  used  before 
detection  is  to  improve  the  probability  of  detection  by  increasing  the 
signal-to-noise  ratio  (S/N)  at  the  output  of  the  filter  which  is  the 
input  to  the  likelihood-ratio  detector. 

When  the  statistics  of  the  signal  are  known  exactly  and  the  noise 
is  white  Gaussian  noise,  the  optimum  detector  is  a matched  filter 
(Whalen,  132),  This  is  a relatively  simple  system  and  much  knowledge 
has  been  accumulated  on  matched  filter  systems.  However,  the  matched 
filter  assumes  complete  statistical  information  about  the  input 
processes. 

The  problem  of  detecting  Gaussian  signals  in  additive  Gaussian 
noise  fields  was  studied  by  Bryn  (20),  who  showed  that,  assuming  K 
antenna  elements  in  the  array,  the  Bayes  optimum  detector  could  be 
implemented  by  the  measurement  and  inversion  of  a 2K  by  2K  correla- 
tion matrix.  Mermoz  (87)  proposed  a similar  scheme  for  known  narrowband 
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signals,  using  the  slgnal>to-nolse  ratio  (S/N)  as  a performance  criterion. 
It  la  obvious  that  for  any  reasonable  size  array  (50  sensors  or  more) 
the  optimum  Bayes  detector  would  be  almost  impossible  to  implement  in 
real  time. 

Let  us  now  discuss  the  detection  of  an  unknown  signal  (Goode,  50). 
If  we  assume  Gaussian  statistics  and  make  the  following  assumptions 


x(t)  - £(t)  + n(t)  , 

(2.7) 

E{x(t) } - 0 • E{n(t) } , 

(2.8) 

E{x(t)  xT(u)}  - Rxx(t,u) 

(2.9) 

E{x(t)  nT(u)}  - 0 , 

(2.10) 

then  the  stochastic  signal  is  completely  described  in  the  statistical 
sense.  The  optimum  detection  system  (Figure  5)  for  the  preceding 
problem  comes  from  the  solution  of  the  following  integral  equation 
(Middleton  and  Groginsky,  88) 

/ 0JT  R^tt.u)  K(u,v)(Rnn(v,T)  + R-B(v,T)]du  dv  - RB-(t,T)  . (2.11) 

The  likelihood  ratio  detector  which  results  from  the  solution  of  this 
integral  equation  is  quite  complicated  and,  for  practical  reasons, 
almost  impossible  to  implement.  The  array  processor  design  engineer 
must  simplify  the  structure  of  the  likelihood  ratio  detector  in  order 
for  it  to  operate  in  a real  time  environment.  Any  a priori  knowledge 
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Figure  S.  Optimum  Likelihood  Receiver 


of  signal  or  noise  statistics , short  of  complete  knowledge  in  the 
optimum  case,  allows  the  designer  the  opportunity  to  use  some  adaptive 
structure. 

The  standard  engineering  parameter,  signal-to-noise  ratio  (S/N), 
which  is  defined  as  the  ratio  of  the  average  signal  power  to  the  average 
noise  power  is  the  basic  measure  of  good  performance  of  a detection 
system.  The  family  of  curves  plotted  in  detection  theory  to  predict  the 
likelihood  ratio  detectors  performance  are  referred  to  as  receiver 
operating  characteristics  (ROC).  A study  of  these  curves  will  show 
immediately  that  the  most  important  variable  in  those  curves  is  the 
signal-to-noise  ratio.  As  the  signal-to-noise  ratio  goes  up,  the 
receiver  performance  goes  up  even  faster  than  a linear  function  (for  a 
fixed  false  alarm  rate).  Any  system  which  increases  the  signal-to-noise 
ratio  is  invaluable  in  a detection  system. 

The  adaptive  processor  described  in  this  research  increases  the 
signal-to-noise  ratio  at  the  output  of  the  adaptive  filter  so  as  to 
increase  the  probability  of  success  in  the  likelihood  ratio  detector. 

2.3  Estimation;  Linear  Filters 

2.3.1  Wiener  Filter  If  we  assume  that  the  input  process  is 
corrupted  by  noise  and  we  want  to  extract  the  signal  from  the  noise, 
then  the  problem  is  one  of  filtering.  The  performance  measure  used  to 
calculate  the  Wiener  filter  is  the  mean  square  error  between  the  signal 
and  a linear  estimate  of  the  signal.  We  have  assumed  (Section  2.2)  that 
we  know  the  desired  signal  d(t)  , as  well  as  the  input  signal  correla- 
tion matrix,  R (t,u)  . Let  us  also  assume  that  we  know  a correlation 
ss 
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vector  r,  (T,u)  defined  by: 

— ax 

r (T,U)  - E[d(T)  X (u) I . (2.12) 

—die 

The  solution  of  the  following  integral  equation  gives  the  filter  vector, 

h (w,v)  , which  minimizes  the  mean  square  error, 
o 

Tf 

r.  (t,o)  = / [R  (t,u)  + R (t,u)]  h (u,a)  du  . (2.13) 

— dx  ' nn  ss_  ~o 

i 

If  we  assume  all  processes  are  stationary  and  we  use  a change  of 
variables 

T “ t -a  and  v t - u (2.14) 

then  we  can  write  Equation  (2.13)  as: 

r.  (t)  - R (v)  h(T  - v)  dv  . (2.15) 

djc  'o  xx 

The  solution  h (t)  is  called  a Wiener  filter  (Wiener,  134). 

— o 

The  analytical  solution  of  the  Wiener-Hopf  integral  Equation 
(2.15)  requires  spectral  factorization  which  is  difficult  to  implement 
as  a simple  iterative  procedure. 

If  certain  restrictions  (including  those  made  above)  are  put  on 
the  input  signal  and  the  optimum  filter,  then  certain  simplifications 
can  be  made. 
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The  specific  restrictions  on  the  input  = s.(t)  + n.(t) 

and  on  the  optimum  filter  h (t)  are: 

- o 

(1)  Both  the  signal,  s^(t)  , and  the  noise,  in(t)  , 
sample  functions  from  random  processes  that  are 
at  least  wide  sense  stationary. 

(2)  The  filter,  h (t)  , is  to  be  physically 

- o 

realizable. 

If  we  require  that  h^(t)  be  physically  realizable,  then  we  must 
guarantee 


h 

— o 


(t) 


0 for  t < 0 


(2.16) 


Under  the  assumptions  of  stationarity  and  physical  realizability, 
we  can  change  the  lower  limit  on  the  integral  in  Equation  (2.15)  to 
minus  infinity  and  then  take  the  Fourier  transform  of  the  resulting 
equation.  The  result  is  well  known  and  it  is 


C<£)  ■ °xx'1<£)V£)  • <21,> 

What  is  assumed,  in  most  of  adaptive  filter  literature,  by 
Equation  (2.17)  is  that  in  doing  the  spectral  factorization  the  correct 
singularities  are  selected  so  as  to  make  the  filter  realizable.  While 
both  the  unrealizable  and  realizable  filters  have  the  same  generic  form, 
they  are  in  fact  two  different  filters  because  of  the  types  of  singu- 

it 

larities  contained  in  each.  The  fact  that  h (f)  is  optimum  but 


unrealizable  since,  in  general,  it  possesses  singularities  in  the  lower 
half  plane  and  hence  its  Fourier  transform  does  not  vanish  for  negative 
time,  is  sometimes  overlooked. 

Spectral  factorization  is  a time  consuming  process  and  difficult 
to  implement  in  real  time.  Anderson  et  al.  (3)  discuss  a recursive 
algorithm  for  spectral  factorization  but  gives  no  estimate  of  the  time 
required  to  do  the  factorization.  Since  the  method  requires  the  re- 
cursive solution  of  a Riccati  difference  equation,  it  involves  consider- 
able computational  effort. 

The  purpose  of  the  proposed  adaptive  algorithm  is  to  obtain  an 
adaptive  approximation  to  the  optimum  Wiener  filter  in  a computationally 
fast  and  efficient  manner  and  to  use  the  resulting  filter  in  a real  time 
system. 


2.3.2  Kalman-Bucy  Filter  For  optimum  state  estimation  in  non- 
stationary environments,  the  Kalman  filter  gives  the  best  estimate  in 
the  minimum  mean-square  error  sense  (Sage,  107).  Kalman  (63)  changed 
the  classical  formulation  of  the  minimum  mean  square  error  problem  by 
using  a state  variable  model  for  the  state  estimator.  The  linear  system, 
whose  state  we  want  to  estimate,  is  assumed  driven  by  white  noise 
(Figure  6).  We  wish  to  find  a state  estimator  that  gives  the  best 
estimate  x(t)  of  x(0  in  the  minimum  mean-square  error  sense.  Thus 


we  want  to  find  an  estimator  such  that 
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is  a minimum.  The  equation  describing  the  system  model  is 


x - F(t)  x(t)  + G(t)  w(t)  . 


(2.19) 


The  observation  is 


z(t)  • H(t)  x(t)  + v(t) 


(2.20) 


It  is  assumed  that  the  input  w(t)  and  the  measurement  noise  v(t)  are 
zero  mean  and  white,  and  that  they  are  uncorrelated  such  that  we  have 


E{w(t)  w (t)} 
E{v(t)  vT(t)} 


Q(t)  6(t  - T) 
R(t)  6(t  - T) 


and 


E{w(t)  vT(t)}  - E{vT(t)  wT(t)} 


(2.21) 

(2.22) 

(2.23) 


We  wish  to  find  the  estimator  that  gives  the  linear  minimum  mean 
square  error  and  is  unbiased  such  that 


and 


E{x(t) } - E{x(t) } 


T ^ 

E{x  (t)  x(t)}  ■ minimum 


(2.24) 


where  x(t)  is  the  estimation  error  given  by: 


x(t)  - x(t)  - x(t)  . 


(2.25) 
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The  solution  to  this  linear  minimum  mean  square  error  problem 
still  results,  as  in  Wiener  filtering,  in  an  integral  equation.  Rather 
than  solving  the  integral  equation  directly,  Kalman  converts  the 
integral  equation  to  a nonlinear  Rlccati  type  differential  equation. 

The  solution  of  this  Ricattl  type  nonlinear  differential  equation  is  the 
covariance  matrix  of  the  minimum  mean  square  error.  Kalman  originally 
performed  this  linear  mean  square  error  minimization  by  using  the 
orthogonal  projection  lemma  as  formulated  from  Hilbert  space  considera- 
tions. 

The  orthogonal  projection  lemma  (Halmos,  53)  states  that  the 
minimum  linear  mean  square  error  estimate  is  orthogonal  to  the  estima- 
tion error.  This  means  that  the  covariance  of  the  estimate  and  the 
estimation  error  is  zero, 

cov[x(t),  x(t)]  - 0 . (2.26) 

It  can  be  shown,  under  the  assumptions  made,  that  the  minimum  mean 
square  error  is  obtained  if  we  minimize  each  element  of  the  error 
covariance  matrix 

P(t)  - cov[x(t),  x(t)]  - E{x(t)  xT(t)}  (2.27) 

separately. 

The  following  equations  are  the  result  of  the  minimization  pro- 
cedure using  the  orthogonal  projection  lemma.  The  derivation  (Sage, 

107)  appears  in  many  papers  and  books  and  only  the  result  is  reproduced 
here.  The  Kalman  filter  is  described  by  the  matrix  filter 
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K(t)  - P(t)  HT(t)  R_1(t)  , (2.28) 

where  P(t)  Is  Che  solution  of  the  following  matrix  differential 
equation: 

P(t)  - F(t)  P(t)  + P(t)  FT(t)  - P(t)  HT(t)  R_1(t)  H(t)  P(t) 

+ G(t)  Q(t)  GT(t)  (2.29) 

with  the  initial  conditions 

P(t  ) - E[x(t  ) xT(t  )]  - cov[x(t  ),  x(t  )]  . (2.30) 

O “*  O O O O 

Figure  6 is  the  complete  model  of  the  optimum  filter  and  the  message 
model.  Equation  (2.29)  is  a matrix  Riccati  differential  equation  and 
is  very  difficult  to  solve. 

Since  the  Kalman  filter  is  the  minimum  variance  unbiased 
estimator  of  the  signal,  x(t)  , given  only  the  observation  z.(t)  , 
then  anything  less  than  complete  a priori  information  [Equations  (2.21), 
(2.22),  (2.23)]  quickly  deteriorates  system  performance.  Errors  in 
Kalman  filtering  can  arise  from  many  sources.  These  errors  include  an 
Incorrect  model  for  the  system  dynamics  and  Incorrect  statistics 
describing  the  time  history  of  the  covariance  of  the  plant  noise,  Q(t)  , 
and  an  incorrect  covariance  matrix,  R(t)  , for  the  measurement  noise. 

Any  of  the  errors  listed,  plus  any  other  statistical  inaccuracies, 
result  in  a suboptlmal  Kalman  gain.  These  errors  not  only  produce  an 
incorrect  Kalman  gain  but  they  make  it  difficult  to  solve  the  matrix 
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Rlccatl  differential  equations  which  are  sensitive  to  errors.  The 
results  of  the  Kalman  filter  will  not  be  used  directly  but  the  mathe- 
matical minimization  will  be  employed  in  the  derivation  of  the  proposed 
stochastic  adaptive  filter. 

2.3.3  Implementation  of  Multichannel  Wiener  Filters  The  most 

obvious  implementation  of  the  Ulener  filter  uses  the  Inversion  of  the 

R matrix.  This  idea  can  be  dismissed  because  if  there  are  many 
xx 

sensors,  a real  time  computation  of  the  matrix  inverse  is  not  practical. 
Iterative  techniques  have  distinct  advantages  over  the  direct  matrix 
inversion  in  that  storage  and  time  delay  for  computation  of  Ryy  1 is 
not  required,  and  the  iterative  algorithms  result  in  systems  where  the 
optimum  filter  can  be  updated  as  the  environment  is  sampled. 

Baird  (12)  has  implemented  a matrix  inversion  routine  that  is 
part  of  an  adaptive  array  processor.  This  matrix  inversion  lemma 
approximates  the  matrix  inversion  by  a recursive  algorithm.  While  this 
approximate  matrix  inversion  takes  longer  than  the  basic  gradient  pro- 
cedures, it  is  much  faster  than  true  matrix  inversion  and  offers 
possibilities  for  real  time  use. 

One  could  also  implement  a multi- input,  single  output  Wiener 
filter  in  the  frequency  domain  (Goode,  SO).  One  could  take  the  fast 
Fourier  transform  (FFT)  of  the  input  vector  x(t)  and  multiply  the 
result  by  the  optimum  filter  vector  h(u)  defined  by  Equation  (2.17). 
Tha  Fourier  coefficients  obtained  from  the  FFT  are  multipled  by 
[hj(u)]r  and  [hj(u>)]^  , the  real  and  imaginary  parts  of  the  j-th 
weight,  and  the  corresponding  products  for  all  the  sensors  are  summed 
to  form  the  beam  output  at  frequency  u . In  a real  system  there  would 
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be  a discrete  number  of  values,  N , for  the  frequency  to  , and  the 
total  number  of  weights  required  to  form  a beam  over  the  entire 
bandwidth  of  interest  is  2KN  . For  a multiple  output  system,  this 
approach  requires  a large  amount  of  hardware. 

The  approach  taken  here  is  similar  to  the  frequency  domain 
approach  described  above  in  that  it  is  also  a frequency  domain  approach. 
However,  there  are  many  differences.  One  difference  is  the  fact  that 
the  proposed  system  does  not  use  the  sensor  outputs  directly  but  instead 
uses  preformed  beams.  The  system  is  not  only  a multi- input  but  also  a 
multi-output  system. 

Another  approach  is  to  adjust  the  weights  of  a tapped  delay  line 
multi-channel  filter  to  minimize  mean-square  error.  This  time  doamin 
approach  of  tapped  delay  lines  is  the  application  of  the  IMS  algorithm 
(Widrow  et  al. , 133).  A tapped  delay  line  filter  (Figure  7)  is  used  to 
approximate  the  general  continuous  time  filter,  with  the  approximation 
improving  as  the  time  delays  become  smaller  and  the  number  of  taps 
increases.  This  system  can  Increase  in  complexity  very  rapidly  if  it 
has  to  operate  over  a wide  frequency  range.  Since  the  tapped  delay 
line  filter  structure  Is  the  most  common  implementation  of  the  Wiener 
filter  in  the  adaptive  literature,  the  following  section  will  show  the 
derivation  of  the  optimum  tapped  delay  filter  and  the  approximations 
used  to  implement  the  filter  structure  in  discrete  form. 

2. A Tapped  Delay  Line  Filters 

2.4.1  Optimum  Filters  Using  the  criterion  for  a minimum  from 
the  regular  calculus,  we  can  optimize  our  performance  criterion  and  find 
the  optimum  filter  coefficients,  h^d  • 1,  2,  . . .,  KL)  . To  find  the 
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optimum  filter,  the  mean  square  error  (MSE)  performance  criterion 


E{ [d  - y ]2}  - E{e2} 

n n 


(2.31) 


is  used  because  of  its  mathematical  tractability  and  its  physical 
meaning  in  terms  of  power.  If  we  take  the  expected  value  of  the  MSE 
[Equation  (2.31)],  differentiate  the  resulting  expression  with  respect 
to  the  weights,  and  set  the  resulting  expression  to  zero,  we  can  obtain 
the  optimum  tapped  delay  line  filter.  Since 

E(e2}  = E{d  2 - 2d  y + y y T}  , (2.32) 

n n n n n 


T 

then,  with  y « h x , and  x.  - s_  + ii  , we  get  (dropping  the  sample 
number  n) 


E(e2}  - E{d2  - 2d  hTx  + (hTx)(hTx)T}  . 


(2.33) 


Taking  the  indicated  expected  values  gives: 


E{e2}  - E{d2}  + hT  - 2hT  Rxd  . (2.34) 

If  we  differentiate  the  above  expression  with  respect  to  li  , we  get 

7{E(e2)}  - 2R  h - 2r  . . (2.35) 

xx-  xd 
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The  optimal  solution  occurs  when  the  gradient  is  zero.  Thus,  we  get 


^OPT  “ Rxx  ^-xd 


(2.36) 


which  is,  in  fact,  the  form  of  solution  obtained  from  the  Wiener-Hopf 
equation. 

Let  us  recall  that  the  optimum  linear  filter  developed  by  Wiener 
has  from  Equation  (2.17)  the  form 


(w) 


Cxx'1(“)  V“) 


(2.37) 


A large  body  of  mathematics  is  devoted  to  approximating  a 
function  by  using  a series  of  orthogonal  functions.  We  can  represent 
the  desired  optimum  linear  filter  as  a sum  of  the  following  form  of 
orthogonal  functions: 


N 

H(to)  - I A $ (x)  (2.38) 

n n 

n-0 

If  we  use  a weighted  mean  square  deviation  as  our  closeness 
criterion  and  if  we  use  an  infinite  number  of  terms  in  our  approximation, 
then  the  desired  filter  can  be  represented  with  arbitrary  accuracy.  The 
polynomial  approximation  to  the  desired  transfer  function  of  the  filter 
can  then  be  converted  into  a rational  fraction  (for  system  use)  using 
Pade  approximants.  This  method  is  very  sophisticated  and  extremely 
difficult  to  implement.  In  practice  it  is  easier  to  find  simple 
approximations  to  the  desired  transfer  function. 
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2.4.2  Discrete  Approximations  Recall  that  one  particular 

implementation  of  the  Wiener  filter  is  the  tapped  delay  line  filter 

This  filter  consists  of  a tapped  delay  line  with  adjustable  weights  at 

each  tap.  For  this  implementation,  the  filter  functions,  <}>  , from 

n 

Equation  (2.38)  then  represent  delayed  versions  of  the  input  signal. 

In  this  case, 

00 

h(oj)  -Eh  exp(-jkA)  , (2.39) 

k-0 

where  A is  the  time  increment  between  delay  line  taps,  and  h^  is 
the  weight  at  the  kth  tap. 

A particular  implementation  of  the  tapped  delay  line  filter  is 
shown  in  Figure  7.  For  this  system,  the  input  signals  are  in  discrete 
sampled  data  form. 

The  preceding  section  showed  how  the  tapped  delay  line  filter 
could  be  obtained  as  an  approximation  to  the  optimum  Wiener  filter.  If 
we  knew  the  statistical  properties  of  both  the  desired  signal  and  the 
noises,  then  it  might  be  possible  to  determine  the  optimum  linear  Wiener 
filter.  Usually  only  part  of  these  statistics  are  known  and  the  other 
statistics  are  unavailable.  If  we  assume  knowledge  of  either  signals 
or  noise,  then  it  would  be  possible  to  develop  an  optimum  system  using 
these  statistics.  In  most  adaptive  systems,  either  the  statistics  of 
the  desired  signal  or  the  correlation  functions  of  the  desired  signal 
are  assumed  known.  Since  part  of  the  complete  statistical  information 
is  unknown,  adaptation  is  performed  by  adjusting  the  filter  coefficients 
according  to  some  performance  measure.  In  all  that  follows,  it  is 
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assumed  that  the  random  functions  involved  in  the  adaptive  processor 
are  stationary  random  processes.  The  most  common  performance  measure 
used  is  the  minimum  mean  square  error  between  the  desired  signal  and 
the  output  of  the  filter.  This  performance  criterion  is  used  because 
it  represents  a tractable  mathematical  function. 

The  most  common  adaptive  implementation  of  the  tapped  delay  line 
filter  is  the  Widrow  LMS  algorithm.  This  algorithm  will  be  described 
in  Section  2.5.  The  accuracy  of  the  tapped  delay  line  filter  increases 
as  the  number  of  weights  becomes  large  and  the  time  difference  between 
the  taps  becomes  small. 

The  adaptive  algorithm  proposed  in  this  research  is  the  adaptive 
realization  of  the  frequency  domain  version  of  the  optimum  linear  Wiener 
filter.  It  is  a much  more  practical  implementation  of  the  Wiener  filter 
when  the  number  of  sensors  (not  beams)  becomes  large. 

2. 5 Adaptive  Algorithms 

The  following  section  contains  a summary  of  the  adaptive  pro- 
cessors reported  in  the  literature.  It  contains  a short  account  of  the 
approximations,  techniques  of  implementation,  and  varied  use  of 
adaptive  processors. 

This  section  has  been  written  to  put  into  perspective  the 
relations  between  the  complex  frequency  domain  stochastic  adaptive 
processor  and  the  previous  attempts  at  adaptive  processors  which  were 
almost  all  time  domain  deterministic  methods. 

Urkowitz  (124)  considers  the  detection  and  estimation  of  a signal 
field  in  the  presence  of  a noise  field.  He  uses  a Karhunen-Loeve 
expansion,  a generalization  of  Fourier  Series,  to  obtain  a series 
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representation  with  uncorrelated  coefficients,  of  the  random  process  of 
transducer  outputs.  If  one  looks  at  the  complexity  in  the  generation 
of  the  test  statistic  (likelihood  ratio)  used  in  this  detection  system, 
one  can  quickly  grasp  the  fact  that  an  Implementation  of  the  complete 
optimal  system  is  extremely  complicated. 

One  of  the  difficulties  in  implementing  the  general  optimum  array 
processor  is  the  inversion  of  the  Rxx  matrix.  For  any  size  system, 
an  on-line  Implementation  of  straightforward  matrix  inversion  is 
impractical.  Processors  based  on  this  direct  computation  procedure  are 
poorly  suited  for  on-line  calculations,  since  they  require  tedious 
computations  which  must  be  repeated  if  the  environment  changes.  To 
avoid  the  computation  problems  associated  with  the  direct  calculation 
of  the  optimal  filter  parameters,  several  adaptive  procedures  have  been 
developed.  These  adaptive  algorithm  have  the  advantages  that  the 
required  computations  are  generally  much  simpler  and  that  they  can  be 
continually  updated,  thereby  accounting  for  time  variations  in  the 
environment. 

Many  authors  have  attempted  to  apply  deterministic  gradient 
minimization  to  derive  the  matrix  filter  that  realized  an  adaptive 
antenna  system  in  the  time  domain.  Widrow's  LMS  algorithm  adaptively 
formed  patterns  that  placed  nulls  at  the  spatial  location  in  the  beam 
pattern  of  the  noise  sources.  This  adaptive  processor  performs  filter- 
lng  in  space.  By  far,  the  largest  class  of  adaptive  array  processors 
are  the  various  forms  (Danlell,  30;  Frost,  47;  Goode,  50;  Griffiths,  52) 
of  the  Wldrow  LMS  algorithm  which  uses  the  tapped  delay  line  as  their 
basic  filter  structure.  All  of  these  LMS  algorithms,  except  Frost's, 


42 


use  steepest  descent  to  calculate  the  adaptive  weight  vector.  There 

W 

are  many  variations  of  the  basic  LMS  algorithm  and  they  only  differ  in 
their  assumptions  of  known  statistics.  For  the  basic  LMS  algorithm  of 
the  form 

w “ w + y[d  x - x Tx  w ] , (2.40) 

^i+!  n n~n  n rm 

it  is  assumed  that  the  desired  signal  or  pilot  signal,  d(n)  , is 

known  and  an  input  cross  correlation  must  be  calculated.  Griffiths  (51) 

assumes  knowledge  of  cross  correlations  between  the  observed  signal 

veccor,  x , and  the  target  signal,  d 
— n n 

The  gain  constant  y not  only  determines  the  speed  of  conver- 
gence, but  also  the  misadjustment  or  noisiness  of  the  estimation  process. 
All  of  the  LMS  type  algorithms  have  commonality  in  the  facts  that  they 
use  the  tapped  delay  line  as  the  filter  model  and  deterministic  gradient 
type  steepest  descent  to  calculate  the  adaptive  filter  weights.  The 
LMS  type  algorithms  assume  that  the  direction  of  the  desired  signal  is 
known  a priori  and  they  use  this  knowledge  to  put  specified  gains  on 
both  the  desired  signal  and  the  noises  by  adjusting  th<_  antenna  or 
directivity  pattern. 

Frost's  algorithm  (47)  belongs  to  the  class  of  LMS  algorithms 
but  Frost  uses  a projected  steepest  descent  algorithm  to  find  the 
optimum  filter.  His  algorithm  requires  knowledge  of  input  cross 
correlation  matrix  and  detailed  knowledge  of  signal  and  noise  geometry 
used  in  the  formulation  of  the  constraint  which  maintains  a chosen 
frequency  characteristic  for  the  array  in  the  direction  of  interest. 
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These  characteristics,  which  are  formulated  in  the  form  of  a constraint, 
are  very  difficult  to  design  for  general  classes  of  array  problems. 

R.  L.  Riegler  and  R.  T.  Compton  (101)  also  used  a steepest 
descent  minimization  of  mean  square  error.  In  this  system,  no  a pi.  -i 
information  about  the  angles  of  arrival  of  signals  was  required.  H «- 
ever,  certain  statistical  characteristics  of  the  desired  signal  had  to 
be  known.  A reference  signal  replica  of  the  desired  signal  must  be 
available  and  the  input  cross  correlation  matrix  is  also  needed  to 
calculate  the  experimental  mean  square  error.  They  found,  as  was  found 
in  this  research,  that  minimizing  mean  square  error  is  equivalent  to 
maximizing  S/N  for  all  but  very  low  input  S/N.  Their  filter  structure 
was  a form  of  the  tapped  delay  line  filter  used  by  all  the  Widrow  LMS 
algorithms . 

1 

Zahm  (139)  extended  the  technique  of  power  equalization  developed 
by  Riegler  and  Compton  for  a wider  band  of  signals.  The  power  equaliza- 
tion technique  is  based  on  proportional  feedback  control  which  equalized 
the  power  out  of  the  array  of  sensing  elements  due  to  the  desired  signal 
and  an  interference  source. 

Schwartz  and  Winkler  (113)  use  Rosen's  projected  gradient 
algorithm  to  design  an  iterative  algorithm  to  minimize  the  mean  square 
error  or  maximize  the  S/N  subject  to  certain  linear  constraints.  Their 
algorithm  is  very  similar  to  Frost's  algorithm  in  that  both  use 
projected  gradient  algorithms.  The  Winkler  and  Schwartz  algorithm 
assumes  knowledge  of  both  a desired  signal  and  the  cross  spectrum 
between  the  desired  and  received  signals. 
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The  side  lobe  canceller  (Howells,  61)  is  a system  which  has  a 
main  beam  and  forms  an  auxiliary  beam  from  elements  distributed  over 
the  face  of  the  array.  Auxiliary  beam  signals  are  subtracted  with  right 
amplitude  and  phase  from  the  main  beam  to  cancel  any  interference  not  in 
the  main  lobe.  It  is  known  that  in  the  case  of  non-zero  bandwidth 
interference,  the  side  lobe  canceller  does  not  give  perfect  cancellation 
because  the  auxiliary  beam  does  not  contain  the  exact  replica  of  the 
main  beam  signal.  This  effect  is  more  pronounced  the  larger  the 
cancellation  system's  fractional  bandwidth.  To  minimize  these  bandwidth 
effects,  one  must  use  multiple  auxiliary  beams  and  cancellation  loops 
The  objective  is  to  distribute  auxiliary  beam  elements  over  the  face  of 
the  array  as  much  as  possible  in  the  same  way  as  the  elements  of  the 
main  beam  are  distributed.  Thus,  the  side  lobe  canceller  cannot  be  used 

t 

f 

for  cancelling  wideband  interference  unless  the  number  of  cancellation 
loops  is  made  very  large. 

Claerbout  (25)  designed  a processor  in  which  signal  information 
is  given  in  the  form  of  various  constraints  on  the  filter  coefficients 
rather  than  being  given  as  a signal  correlation  function  in  the  design 
of  least-squares  filters. 

Kobayashi  (70)  used  both  the  methods  of  steepest  descent  and 
i conjugate  gradients  to  design  an  adaptive  filter  based  on  the  data  taken 

over  some  fitting  interval  that  minimizes  the  output  noise  power  without 
distorting  the  signal.  His  algorithm  requires  knowledge  of  the  input 
cross  correlation  matrix  or  cross  power  spectral  matrix.  He  applied  his 
algorithm  to  seismic  processing  system  used  to  detect  earthquakes 
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Another  method  of  calculating  a set  of  weight  values  for  a 
tapped-delay-line  multichannel  filter  is  called  maximum-likelihood-ratio 
(MLR)  processing.  This  method  has  been  applied  by  Capon  et  al.  (23)  to 
the  processing  of  large  aperture  seismic  array  (LASA)  data.  Maxlmum- 
likelihood-ratio  processing  assumes  that  the  velocity  and  directional 
properties  of  the  target  signal  are  known  a priori.  From  a knowledge 
of  the  spatial  characteristics  of  the  target,  one  is  able  to  design 
spatial  correction  filters.  The  outputs  of  the  spatial  correction 
filters  are  then  used  as  inputs  to  the  tapped-delay  line  processor 
which  determines  the  filter  weights. 

Brennan  and  Reed  (19)  developed  a theory  for  an  adaptive  pro- 
cessor which  maximizes  the  probability  of  detection  for  a fixed  false 
alarm  rate.  Their  derivation,  however,  was  incorrect  because  they 
incorrectly  took  derivatives  of  non-analytic  complex  functions 

Muellar  and  Spaulding  (90)  used  the  gradient  steepest  descent 
algorithm  with  minimum  mean  square  error  as  the  performance  measure  to 
determine  a technique  for  start-up  of  adaptive  transversal  filter  (tapped 
delay  lines)  equalizers  used  in  high  speed  synchronous  data  communica- 
tions. They  use  a specially  derived  training  sequence  for  the  required 
pilot  signal  to  obtain  the  data  line  synchronization.  Their  technique 
is  called  cyclic  equalization. 

Sondhi  (117)  used  a technique  similar  to  Muellar  and  Spaulding 
to  obtain  echo  cancellation  in  long  distance  telephone  communications 
Both  these  systems  used  in  communications  problems  use  the  tapped  delay 


line  filter  as  their  filter  structure. 
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It  is  true  with  all  on-line  implementations  of  estimation 
algorithms  whether  they  are  optimal  implementations  or  adaptive 
realizations  that  each  has  its  limitations.  It  has  been  shown  by 
Comer  (27)  that  the  algorithms  with  constant  gain,  y , have  compara- 
tively little  noise  resistance.  Furthermore,  in  the  presence  of 

2 

measuring  error  with  variance  a , convergence  in  the  usual  sense  does 
not  occur  but  instead  we  get ; 

lim  E { | I w -w  ||2}  < Var  (y  , a2)  , (2.41) 

n-~o  ""  ° 


where 


Var(y  , a2)  -*•  0 as  y ■+  0 (2,42) 

o o 

The  only  way  to  reduce  the  variance  of  the  weights  is  to  make 
the  gain  constant  very  small.  This  technique  results  in  an  inordinate 
amount  of  time  to  get  convergence  and  makes  the  gradient  techniques 
inapplicable  for  on-line  use.  The  precise  manner  in  which  the  gain  in 
stochastic  approximation  is  varied,  which  is  exactly  that  suggested 
above,  reduces  the  effect  of  additive  measurement  noise  to  the  point 
where  the  variance  of  the  process  goes  to  zero. 

As  was  noted  before,  the  tapped  delay  line  filter  only 
approximates  the  continuous  time  filter.  In  a broadband  system,  there 
would  have  to  be  a large  number  of  taps  with  small  time  delays  between 
successive  taps  to  get  a good  representation.  The  stochastic 
approximation  algorithm  in  this  research  is  potentially  more  powerful 
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than  non-stochastic  type  algorithms  because  It  has  the  statistical 
nature  of  the  system  built  into  the  adaptive  minimization  problem  and 
since  it  is  specifically  a frequency  domain  technique,  it  is  designed 
to  operate  over  a wide  range  of  frequencies. 

There  are  many  other  authors  who  have  tried  to  realize  adaptive 
filters.  To  try  and  give  a description  of  each  one's  work  would  be 
impractical.  A partial  listing  of  the  more  important  methods  has  been 
given  here.  The  references  contain  a large  survey  of  both  the  methods 
described  and  many  others. 


CHAPTER  III 


STOCHASTIC  APPROXIMATION  AND  PROBABILISTIC  PROOFS 
3.0  Introduction 

In  this  chapter,  a brief  summary  of  theorems  and  convergence 
criterion  of  both  the  Robbins-Monro  and  the  Kiefer-Wolfowitz  stochastic 
approximation  algorithms  is  given.  We  show  the  evolution  and  weakening 
of  the  convergence  conditions  of  stochastic  approximation  and  how  these 
techniques  have  been  applied  to  many  diverse  areas.  The  importance  of 
the  variable  gain  sequence,  yn  , for  both  rapid  convergence  and 
statistical  smoothness  of  the  stochastic  algorithm  is  emphasized.  These 
are  important  reasons  why  the  stochastic  approximation  algorithm  is  more 
useful  than  deterministic  algorithms.  The  last  part  of  Section  3.1  lists 
some  of  the  many  areas  in  which  the  classical  stochastic  approximation 
techniques  have  been  applied. 

We  formally  state  the  classical  projection  theorem  and  use  a 
theorem  formulated  by  Karlin  and  Taylor  (66)  to  establish  the  necessary 
and  sufficient  conditions  for  both  minimum  mean  square  error  and  linear 
adnimum  mean  square  error.  It  is  shown  that  linear  estimators  are  just 
subsets  of  general  minimum  mean  square  error  estimators  and  that  the 
optimum  linear  estimator  has  a simple  geometrical  interpretation. 

The  analysis  in  Section  3.2.1  shows  that  the  optimum  linear 
solution  to  the  minimum  mean  square  error  minimization  can  be  derived 
from  the  orthogonal  projection  lemma.  The  non-analytic  performance 
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measure  (in  the  complex  case)  makes  it  impossible  to  derive  the  optimum 
filter  by  ordinary  calculus.  This  fact  gives  the  impetus  to  use  some 
other  means  for  the  minimization  and  the  orthogonal  projection  lemma  is 
used  for  the  minimization.  The  optimum  matrix  filter  for  minimum  mean 
square  error  is  derived  from  considerations  of  the  necessary  and 
sufficient  conditions  for  minimization  obtained  from  Theorem  3.6  This 
theorem  is  Just  the  stochastic  version  of  the  classical  (non-stochastic) 
orthogonal  projection  theorem  (Theorem  3.3). 

A derivation  is  done  to  obtain  the  regression  function  for  the 
recursive  stochastic  approximation  algorithm  which  is  used  to  derive  the 
matrix  adaptive  filter  for  the  experimental  tests.  The  various  modes  of 
probabilistic  convergence  are  established  and  their  interrelations  are 
compared.  It  is  shown  why  non-probabilistic  convergence  criteria  make 
no  sense  when  the  underlying  processes  are  stochastic  processes. 

We  prove  convergence  of  the  stochastic  adaptive  algorithm  under 
the  convergence  criterion  of  stochastic  approximation  and  establish  the 
important  result  that  the  sequence  of  solutions  of  the  adaptive  filter 
is  a martingale.  The  fact  that  the  sequence  of  solutions  is  a martin- 
gale not  only  establishes  the  convergence  of  the  stochastic  algorithm 
but  also  is  used  to  show  the  stability  of  the  stochastic  adaptive 
algorithm  in  terms  of  a stochastic  control  system. 

3.1  History  of  Stochastic  Approximation  Techniques 

While  the  concepts  Involved  in  the  use  of  stochastic  approximation 
are  relatively  simple  to  understand,  they  are  mathematically  difficult. 
The  convergence  conditions  of  stochastic  approximation  are  difficult  to 
establish  on  a global  basis  and  many  authors  have  dealt  with  extensions 
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and  modifications  of  the  original  Robblns-Monro  (102)  and  Klefer-Wolfwitz 
(68)  processes  and  they  have  devised  ways  to  adapt  stochastic  approxima- 
tion techniques  to  diverse  applications.  The  following  section  traces 
the  history  of  the  Robbins-Monro  and  Kiefer-Wolfwitz  methods  and  shows 
the  evolution  of  the  convergence  criterion  and  their  extensions  to  a 
wider  range  of  minimization  or  root  finding  problems  than  their  original 
derivations. 

3.1.1  Robbins-Monro  Procedure  Stochastic  approximation  had  its 
beginnings  in  Robbins  and  Monro's  classic  paper.  Robbins  and  Monro 
formulated  their  minimization  problem  and  their  convergence  conditions 
by  presenting  the  following  theorem.  The  sequence  yn  is  the  same  type 
of  sequence  used  in  this  research. 

Theorem  3.1  (Robbins  and  Monro,  102) 

Let  r(x)  be  a given  function  and  a a given  constant  such  that 
the  equation  r(x)  - a has  a uniquely  defined  root  x * 0 . Let  y(x) 
be  a realization  of  a measurement.  Assume  y(x)  has  distribution 
P(y(x)  < y]  • g(y|x)  such  that  r(x)  ■ ydg(y|x)  (i.e.,  r(x)  - 

E{(y|x)}  • Choose  xx  arbitrary  and  define  the  recursive  relation: 

Vi  ' *. + - W1  • 

Let  the  sequence  be  of  the  form  1/n  and  assume  there  exists 

some  constant  c > 0 such  that 

P(|y(x)|  <,  c)  - 1 , (3.1) 


and  that  the  conditions 
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A.  r(x)  is  a nondecreasing  function  , 

B.  r (0)  - a , 

C.  r ' (0)  > 0 ; r ' (0)  - ^p- 


are  satisfied.  Then,  the  recursive  relation. 


x . i - x + u {o  - y (x  )} 
n+l  n n n n 


(3.2) 


and  the  preceding  assumptions,  imply  the  result 


Lim  E{ (x  - 0)  } * 0 

n-K»  n 


(3.3) 


Wolfowitz  (136)  weakened  the  convergence  conditions  of  the 
Robbins-Monro  theorem  so  that  if  the  regression  function  satisfied 


|r(x)|  < c , 


(3.4) 


then  the  Robbins-Monro  process  xr  converges  in  probability  to  0 . 

Blum  (16)  weakened  the  convergence  conditions  and  required  that 

the  regression  function  lie  between  two  lines. 

Friedman  (46)  further  weakened  the  convergence  conditions. 

Friedman's  theorem  enables  one  to  construct  a convergence  process  when 
2 

| r (x)  | and  a (x)  are  bounded  by  known  functions  f^x)  and  f2(x)  . 
One  then  takes 

f(x)  - Max[f X(x)  , f2(x)]1/2  (3.5) 
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and  r(x)  , Che  regression  function,  has  only  Co  satisfy 

r(x)  < [£|x|  + k]  (3.6) 

for  positive  constants  £ and  k . The  Robbins-Monro  process  then 
becomes 

Xn+1  “ xn  “ ~ y„(xJ]/f(x  ) . (3.7) 

n+i  n n n n n 

Gladyshev  (47)  simplified  the  conditions  for  convergence  and 
established  proofs  whereby  the  algorithm  converged  with  probability  one. 
It  is  on  his  theorem  that  most  of  the  convergence  proofs  of  stochastic 
approximation  are  based. 

Blum  (16)  generalized  the  Robbins-Monro  procedure  to  the  multi- 
dimensional case.  It  is  on  a matrix  version  of  Blum's  multidimensional 
procedure  that  the  method  in  this  research  is  based. 

3.1.2  Kief er-Wolf owl tz  Kiefer-Wolfowitz  suggested  a method  to 
estimate  the  maximum  of  a regression  function  where  the  regression 
function  is  not  directly  available.  Since  more  measurements  are 
necessary  at  each  step  to  form  some  approximation  to  the  regression 
function,  the  Kiefer-Wolfowitz  process  exhibits  slower  convergence 
properties  than  the  Robbins-Monro  process.  They  proposed  the  following 
theorem  and  the  resultant  convergence  conditions.  It  is  to  be  noted 
that  the  gain  sequence,  an/cn  • used  in  the  method  is  analogous  to 
used  in  the  Robbins-Monro  method. 
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Theorem  3.2  (Kiefer  and  Wolfowitz,  68) 

Let  r(x)  be  a regression  function  and  F(y|x)  a family  of 

1 

distribution  functions  and  assume  that  the  following  conditions  are 
satisfied : 


ty<x)  “ r(x) ] 2 dF(y|x)  < a2  < + 


(3.8) 


Assume  that  r(x)  is  strictly  increasing  for  x < 0 , and  that  r(x) 
is  strictly  decreasing  for  x > 0 . 

Let  {a^}  and  (c^)  '5e  infinite  sequences  of  positive  real 

numbers  such  that 


E a = °°  , E a c < ® , E ac  <°°.  (3.9) 

, n , n n , n n 

n“l  n=l  n=l 


Then,  the  recursive  scheme  defined  by 


a 

x “ x + — [y (x  + c ) - y (x  - c )] 
n+1  n c n n n n 

n 


(3  10) 


converges  in  probability  to  the  maximum,  0 , of  the  regression  function 
r(x)  if  certain  regularity  conditions  are  satisfied. 

Many  authors,  notably  Venter  (130),  have  weakened  the  convergence 
conditions  for  the  Kiefer-Wolfowitz  process. 

One  can  construct  a multidimensional  Kiefer-Wolfowitz  process  by 
forming  2N  observations  of  the  random  vector 


± Ck^ 


k • j ■ 1 , . . . , n 


(3  11) 
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where  the  e^  are  unit  vectors.  This  observation  vector  is  used  to 
evaluate  the  regression  function  in  a recursive  algorithm  of  the  form 


-ti+1 


+ ri(V  • 

n 


(3.12) 


The  {a^}  and  (c^}  are  sequences  of  positive  real  numbers  that 
satisfy 


Z 

n*l 


a 

n 


Z 

n*l 


a c < oo 
n n 


and 


E 

n=l 


(3.13) 


Kiefer-Wolfowitz  (68)  made  some  highly  restrictive  assumptions  from 
which  they  proved  that  Equation  (3.12)  converges  with  probability  one. 
Venter  (130)  relaxed  the  original  convergence  criterion  of  Kiefer- 
Wolfowitz  and  made  the  technique  applicable  to  a wider  variety  of  cases. 

Most  experimenters  in  stochastic  approximation  have  concluded 
that  the  method  be  carried  out  in  two  stages.  The  first  stage  would 
take  large  step  sizes  to  estimate  the  most  likely  region  of  search,  and 
the  second  stage  would  take  smaller  step  sizes  to  fine  tune  the  search. 
Since  the  general  stochastic  approximation  algorithm  is  a local  tech- 
nique rather  than  a global  one,  this  segregation  of  search  stages  is  a 
logical  procedure.  There  does  not  exist,  however,  either  in  the  global 
case  or  the  local  case  any  general  method  for  carrying  out  the  search 
nor  for  adjusting  the  search  steps  in  an  optimum  manner  while  still 
satisfying  the  convergence  conditions. 

Many  authors  have  proposed  specific  ways  to  accelerate  conver- 
gence. Kesten  (65)  proposed  keeping  the  gain,  y , constant  if  the 


difference  (x  - x ,)  had  the  same  sign  as  (x  , - x „)  and 
n n-1  n-1  n-2 

decreasing  the  step  size  otherwise.  Fabian  (43)  proposed  a method  where 
one  would  make  observations  on  the  regression  function  in  random 
directions  and  pick  the  direction  where  the  sign  of  the  regression 
function  was  the  negative  of  the  others.  This  method  is  very  similar 
to  the  deterministic  method  of  steepest  descent. 

The  choice  of  the  sequence  is  of  major  importance  for  the 

speed  of  convergence  of  the  stochastic  algorithm  and  for  the  statistical 
smoothness  of  the  process.  No  general  procedure  exists  for  choosing 
this  sequence.  The  rapid  speed  of  convergence  for  the  proposed  adaptive 
filter  has  been  obtained  by  considering  the  least  mean  square  error  as 
a control  system,  and  then  deriving  the  sequence  which  allows  the 
algorithm  to  operate  on  the  stability  limit  of  the  system.  This 
starting  value  gives  rapid  improvement  at  the  expense  of  initial  noisy 
results.  However,  the  further  application  of  the  infinite  sequence 
smooths  the  resulting  matrix  filter  coefficients  because  of  the  built 
in  statistical  smoothing  function  of  the  stochastic  approximation 
algorithm.  This  smoothing  is  not  possible  with  deterministic  systems. 

Kushner  (76)  has  recently  attempted  to  formulate  a general  theory 
for  optimum  search  strategies  for  stochastic  approximation  He  has  also 
tried  to  formally  incorporate  linear  constraints  in  the  solution  of  the 
general  optimization  problem  solved  by  stochastic  approximation  and  in 
the  search  stra  egy. 

Mendel  and  Fu  (84)  discuss  a method  to  accelerate  convergence 
whereby  they  use  the  orthogonal  projection  of  the  current  estimate  of 
the  random  variable  to  force  the  estimate  to  be  within  a cube.  They  use 
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j knowledge  of  the  convergence  region  of  some  of  the  random  variables 

being  estimated  to  speed  the  convergence  of  the  total  procedure.  This 
method  of  orthogonal  projection  discussed  by  Mendel  and  Fu  is  based  on 
the  same  physical  considerations  as  the  method  used  in  the  matrix 
adaptive  filter  in  this  research. 

Stochastic  approximation  has  been  used  in  a wide  variety  of 
applications.  Many  authors  in  pattern  recognition  (Fu,  46),  adaptive 
automatic  control  systems  (Holmes,  59),  operations  research  (Albert  and 
Gardner,  1),  and  biological  research  (Cochron  and  Davis,  26)  have  applied 
the  methods  of  stochastic  approximation.  Problems  in  adaptive  control 
systems  fall  into  the  stochastic  approximation  framework  and  yield 
computationally  simple  algorithms  which  require  little  storage  and  can 
be  performed  in  real  time.  Stochastic  approximation  has  been  used  to 
estimate  probability  density  functions  for  application  not  only  in 
statistics  but  also  in  communication  theory. 

Chien  and  Fu  (24)  have  used  stochastic  approximation  to  deal 
with  state  estimation  in  stationary  dynamical  systems  where  only  some 
state  components  are  accessible  for  measurement. 

Dupac  (40)  has  extended  the  basic  Robbins-Monro  procedure  so  that 
it  can  be  used  to  estimate  parameters  which  are  slowly  time  varying, 
and  are  varying  in  either  a deterministic  manner  or  in  a random  fashion 
with  their  variances  (time  varying)  going  to  zero  as  time  tends  to 
Infinity.  This  extension  has  use  in  auto  regressive  and  moving  average 
processes  not  only  in  statistics  but  also  in  any  system  that  needs 
reliable  estimates  of  the  mean  of  a process.  One  fact  that  can  be 
gleaned  from  this  short  history  of  stochastic  approximation  Is  that  the 
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applications  of  the  technique  have  been  varied.  In  the  sections  that 
follow,  stochastic  approximation  will  be  used  to  derive  a recursive 
relation  which  will  give  a stochastic  adaptive  realization  of  multi- 
input,  multi-output  Wiener  filter  to  be  used  as  the  matrix  filter  in  an 
adaptive  array  processor. 

The  following  section  formulates  the  general  minimum  mean  square 
estimation  problem,  and  establishes  a theorem  whereby  the  linear  minimum 
mean  square  minimization  can  be  solved.  We  formally  state  the  definition 
of  a regression  function  to  be  used  in  all  the  succeeding  analysis  and 
compare  the  relationship  of  the  various  modes  of  probabilistic  conver- 
gence . 

Theorem  3.3  is  a formal  statement  of  the  classical  projection 
theorem.  It  should  be  contrasted  with  Theorem  3.4  which  is  the 
stochastic  version  of  the  classical  theorem.  The  stochastic  version 
provides  the  basis  for  finding  the  minimum  mean  square  error  and  the 
analytic  method  whereby  the  optimum  matrix  adaptive  filter  and  the 
regression  function  are  derived. 

Theorem  3 ■ 3 (Luenberger,  79) 

Let  S be  a complete  metric  space.  Let  T be  a closed  vector 

space  which  is  a subspace  of  S . Let  x be  a vector  that  is  in  T 

but  not  necessarily  in  S . If  we  let  ii  be  any  vector  in  T , then 

there  exists  a unique  u^  such  that  | |x  - u^| | £ | |x  - u| | for  all 

ii  in  T . From  geometric  considerations,  the  necessary  and  sufficienr 

condition  that  u must  satisfy  to  fulfill  the  above  conditions  is  that 
o 

the  vector  (x  - u ) be  orthogonal  to  the  subspace  T . 
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3 . 2 Mathematical  Preliminaries 

3.2,1  Minimum  Mean  Square  Error  Estimation  Since  we  are 
Interested  in  estimating  the  value  of  a random  variable  resulting  from 
the  observation  of  an  experiment,  we  now  formulate  the  mean  square 
error  estimation  problem  (Karlin  and  Taylor,  66).  Since  stationary 
random  processes  are  assumed  in  this  research,  the  estimation  problem 
is  considered  in  this  context.  From  this  formulation  will  evolve  a 
theorem  which  gives  the  necessary  and  sufficient  conditions  for  linear 
minimum  mean  square  error  which  is  used  in  the  derivation  of  the 
stochastic  adaptive  filter. 

If  we  are  concerned  with  the  problem  of  estimating  the  random 
variable  x from  an  observation  of  past  and  present  values  of  the 
random  variable  y , then  the  problem  is  called  a prediction  problem 
The  prediction  problem  involves  construction  of  certain  types  of 
estimators.  The  physical  problem  that  concerns  us  in  this  research  is 
more  precisely  called  a filtering  problem  because  it  consists  of 
estimating  a random  variable  based  on  a observation  of  a process  con- 
taining both  signal  and  noise.  However,  the  estimator  designed  uses 
both  past  and  present  values  and  is  used  to  predict  future  values  It 
is  for  these  reasons  that  the  terms  describing  the  estimator  will  be 
used  interchangeably. 

We  want  to  estimate  the  best  value  for  x , the  result  of  some 
observation.  We  call  x our  estimation  and  we  desire  to  make  our 
estimation  error  (x  - x)  as  small  as  possible  in  some  mathematical 


sense.  Since  the  experimental  observations  are  random  processes,  a 
good  probabilistic  measure  of  performance  for  our  estimator  is  the  mean 
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squared  error 

E{ (x  - x)2}  - E{e2}  . (3.14) 

It  is  to  be  observed  that  this  performance  measure  is  an  average 
measure  and  sometimes  we  will  be  correct,  and  sometimes  we  will  be  wrong 
in  our  estimations.  We  will  minimize  our  error  on  the  average. 

Thus,  we  want  to  find  an  estimator  that  minimizes  over  all 
estimators,  x , (both  linear  and  nonlinear),  the  mean  square  error 
E{ (x  - x)  } . In  most  systems,  and  adaptive  systems  in  particular,  the 
most  Important  design  criterion  is  the  mathematical  tractabillty  of  the 
performance  criterion  that  is  chosen.  The  mean  square  error  criterion 
is  not  only  mathematically  tractable  but  It  possesses  the  realistic 
physical  design  criterion  that  the  error  is  minimized  in  the  power  sense. 
Wiener  (134)  made  a major  contribution  to  the  engineering  literature 
when  he  used  the  minimum  mean  square  error  criterion  to  find  his  optimal 
estimator,  the  Wiener  filter.  The  ultimate  result  of  this  research  is 
a recursive  algorithm  to  find  the  Wiener  filter  and  the  results  of 
Theorem  3.6  are  the  cornerstones  of  the  derivation  of  the  adaptive 
algorithm. 

The  mean  square  error  is  defined  as  the  error  over  a band  of 
frequencies.  Since  each  term  of  the  mean  square  error  at  every  dis- 
crete frequency  in  the  band  is  positive  or  zero,  then  each  term  can  be 
minimized  separately.  In  this  research  the  minimum  mean  square  error  is 
found  at  each  frequency  in  the  band  of  interest.  Since  each  term  is 
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positive  or  zero,  we  can  guarantee  that  the  mean  square  error  is  minimum 
over  the  entire  band  of  interest. 

The  following  analysis  (Karlin  and  Taylor,  66)  illustrates  the 
fact  that  if  linear  estimators,  as  realistic  approximations  to  true 
minimum  mean  square  error  estimators,  are  desired,  then  some  degradation, 
except  for  Gaussian  statistics,  of  performance  from  the  true  minimum 
mean  square  error  estimation  must  be  expected. 

If  we  assume  that  y is  the  result  of  an  experiment  from  which 
we  wish  to  estimate  a value  of  x , then  for  the  jointly  distributed 
random  variables  x and  y we  allow  any  estimator  having  finite 
variance. 

In  the  case  where  the  outcome  of  an  experiment,  x , has  a known 
mean,  y , the  best  predictor  for  x in  the  mean  square  error  sense  is 

A 

x - y . 


We  could  now  argue  that  in  the  general  case  the  appropriate 
distribution  for  our  estimator  is  the  mean  of  x computed  from  the 
conditional  distribution,  y^j^  “ E{x|y}  . This  can  be  shown  to  be 
true  by  the  following  example  (Karlin  and  Taylor,  66).  We  compute  the 
mean  square  error  as 


E{(x  - x)2} 


«(,  - Ux|y>2>+  2«(X  - U.lyK^iy  - «)) 

+ E{ (yx|y  - x)2>  • (3.15) 


The  second  term  in  Equation  (3.15)  can  be  shown  to  be  zero  by 


evaluating  the  conditional  expectations  Indicated.  Expanding  the 
second  term  by  use  of  identities  from  Appendix  C,  we  get 
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E((x  - ux|y,(u,|y  - 8)  - e<e((x  - ux|y)(ux|y  - «)|y|l 

- 0 . (3.16) 

Thus,  Equation  (3.15)  becomes 

E{ (x  - x)2}  - E(x  - y , }2  + E{p  | - x}2  , (3.17) 

x|  y x|y 

and  the  right-hand  side  is  minimized  by  setting  x - E { x | y } 

In  practical  situations,  the  optimal  estimator,  x = E{x|y}  , 
is  almost  never  known.  Linear  estimators,  while  only  approximations  to 
true  minimum  mean  square  error  estimators,  lead  to  easily  implemented 
estimators.  Since  mathematical  tractability  and  ease  of  implementation 
are  the  major  concerns  in  this  research,  all  the  succeeding  analysis 
concerns  linear  estimators.  We  allow  as  estimators  only  those 
estimators  that  are  linear  functions  of  the  random  variable,  y Since 
this  class  of  allowable  estimators  is  smaller  than  in  the  general 
minimum  mean  square  estimation,  the  resulting  minimum  mean  square 
estimation  error  is  at  best  equal  to  but  generally  greater  than  true 
minimum  mean  square  error. 

A useful  result  of  minimum  mean  square  error  estimation  (Rhodes, 
100)  is  that  the  estimation  error  x ■ x - x in  the  least  mean  square 
estimator  x - E{x|y}  is  uncorrelated  with  any  function  g of  the 
random  vector  y , i.e.,  , 


E(g(y)x}  - 0 


(3.18) 
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and 

E{g(y)x|y)  - o . (3  19) 

The  following  theorem  provides  both  the  necessary  and  sufficient 
conditions  for  the  linear  minimum  mean  square  error  and  the  theorem 
shows  that  the  linear  mean  square  error  estimator  is  unique.  Proofs  of 
each  of  the  assumptions  and  assertions  of  this  theorem  appear  in  Karim 
and  Taylor  (66) , Van  Trees  (124) , and  Feller  (44) . 

Theorem  3.4  (Karlin  and  Taylor,  66) 

2 

Let  x satisfy  the  relation  E{x  } < °°  and  assume  that  all 
estimators  x satisfy  the  same  relation.  We  will  also  allow  as 
estimators  any  linear  combination  of  estimators,  i.e.  if  x^  and  x 2 
are  estimators  then  ax^  + bx^  is  also  a estimator  for  any  real  a and 
b . 

A.  A estimator  x has  minimum  mean  square  error 

if  and  only  if  E{ (x  - x)u)  » 0 for  every 

estimator  u . 

B.  If  we  assume  the  existence  of  minimum  mean 

square  estimators  x^  and  x^  , then  the 
minimum  mean  square  error  estimator  is  unique 
in  the  sense  that  E{ (x^  - Xj)^}  “ 0 . 

The  following  definitions  and  theorems  contain  mathematical 


concepts  necessary  in  the  proofs  of  convergence  of  the  stochastic 
approximation  algorithm.  Other  results  from  conditional  probability 
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theory  used  in  succeeding  proofs  appear  in  Appendix  C. 

Definition  3.1 

If  we  assume  that  {x  , n ■ 1,2,  . . .}  and  {y  , 

n 'n 

n ■ 1,2,  . . .}  are  stochastic  processes  and  that  {x  } satisfies 

n 


E{  | ^ | } < 00 


(3  20,' 


and 


x 

n 


then  {x  } 
n 


is  a martingale  with  respect  to  {y^}  • 


(3.21) 


From  a practical  point  of  view,  the  stochastic  process  {y  } 

n 

can  be  considered  as  the  available  information  up  to  time  n . From 

real  analysis  {x  } can  be  considered  a function  of  (y  } . 

n n 

Theorem  3.5 

Let  {x^}  be  a martingale  with  respect  to  {y^}  satisfying 


E{x  < 00 
n 


(3.22) 


Then,  {x^}  converges  to  a limit  both  with  probability  one  and  in  the 
mean  square,  i.e.. 


PrClim  [x  - x ] } - 1 

n 00 

n-*°° 


(3.23) 
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and 


lim  E{  x - x } = 0 

n 00 

n-w°  i ' 


(3  24) 


Definition  3.2 

A wide-sense  stationary  process  is  a stochastic  process 

2 

{x  , neN}  having  finite  second  moments,  E{x  } < °°  , a constant 

n n 

mean,  m - E{x  } , and  a covariance  function,  E{[x  - m] [x  - m] 1 , 
n n^  n£ 

that  depends  only  on  the  time  difference  | n^  - n^ | . 

The  preceding  definitions  and  theorems  along  with  the  following 
property  of  conditional  expectation 

E{f(ylt  . . . yn)jyx.  • • • yn>  = f(y1.  • • • yn)  (3-25) 

are  the  cornerstones  for  the  proofs  involving  the  stochastic  approxima- 
tion algorithm. 

3.2.2  Regression  Function  and  Probabilistic  Convergence  We  wan* 
to  estimate  the  random  variable,  y , by  a suitable  function  g(x)  of 
x so  that  the  mean  square  estimation  error 

E{ [y  - r(x)]2}  - _oo/°°  [y  - r(x)]2  f(x,y)  dx  dy  (3.26) 

is  minimum.  The  function  g(x)  that  minimizes  the  above  expression  is, 
from  the  previous  section,  the  conditional  expected  value  of  y 
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assuming  x i.e., 

r (x)  - E{yjx}  . (3.27) 

The  function 

r (x)  » E{y | x}  (3  28) 

is  known  as  a regression  function.  This  definition  of  a regression 
function  is  used  in  all  the  analysis  that  follows  and  specifically  in 
the  stochastic  approximation  algorithm. 

Since  the  value  of  the  gradient  or  its  counterpart,  the  regres- 
sion function,  is  not  known  exactly  but  depends  on  some  stochastic 
process,  we  cannot  use  the  ordinary  concepts  of  convergence  for  non- 
random  processes.  The  concepts  of  convergence  must  be  redefined  to 
encompass  the  stochastic  nature  of  the  recursive  algorithm. 

There  are  three  types  of  stochastic  convergence  (Cramer  and 
Ledbetter,  29):  convergence  in  probability,  mean  square  convergence, 

and  convergence  with  probability  one.  A vector  converges  in 

probability  to  as  n ■+  00  if  for  any  e > 0 , the  probability 

that  the  norm  | |h^  - I does  not  exceed  e converges  to  zero, 

i.e. 

lim  PROB{|  1)^  - hopT|  I > e}  - 0 . (3.-29) 

n-*°° 


Convergence  in  probability  does  not  imply  that  every  sequence  h of 
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random  vectors  converges  to  h^pT  in  the  ordinary  sense.  A random 
vector  converges  to  in  the  mean  square  sense  as  n -*•  °°  if 

the  mathematical  expectation  of  the  square  of  the  norm  | |h^  — h^j I I 
converges  to  zero 


lim  E{||h^  - ^1  | } - 0 

n-K» 


(3  30) 


Convergence  in  the  mean  square  implies  convergence  in  probability  but  it 
does  not  imply  ordinary  convergence  for  any  random  vector  . Conver- 

gence in  the  mean-square  is  related  to  the  investigations  of  the  moments 
of  the  second  order.  The  above  two  types  of  convergence  may  not  be 
satisfactory  since  in  both  types  of  convergence,  the  probability  that  a 

given  vector  h converges  to  in  an  ordinary  sense  is  zero, 

n “Ur  1 

Since,  in  the  recursive  algorithm  we  only  have  available  a sample  or 
noisy  regression  function  (or  gradient),  it  is  desirable  that  the  limit 
exists  for  that  particular  sequence  of  random  vectors  h^  which  is 
actually  observed,  and  not  for  a family  of  random  sequences  which  may 
never  be  observed.  This  type  of  convergence  can  be  assured  if  we 
introduce  the  concept  of  convergence  almost  everywhere  (a.e.)  or 
convergence  with  probability  one.  Since  h^  is  a stochastic  process, 
we  can  consider  the  convergence  of  a sequence  h^  to  h^p^,  as  a random 
event.  The  sequence  of  random  vectors  h^  converges  to  h^p^,  as 
n 00  almost  certainly  or  with  probability  one,  ii  the  probability  of 
ordinary  convergence  of  h^  to  hOPT  is  equal  to  one 


PROBflim  ||h^  - hQpT  | | ' 
n-*» 


- 0}  - 1 


(3.31) 


67 


I 


Thus,  by  neglecting  the  set  of  sequences  of  random  vectors  with  total 
probability  equal  to  zero,  we  have  an  ordinary  convergence. 

The  great  power  of  the  method  of  stochastic  approximation  lies 
in  the  fact  that,  if  the  conditions  of  convergence  are  satisfied,  then 
the  random  vector  h^  converges  with  probability  one  to  h^p^,  . 

The  following  section  contains  the  general  convergence  proof  for 
a recursive  algorithm  (vector  case)  using  stochastic  approximation  and 
it  establishes  that  the  sequence  of  solutions  is  a martingale.  It  will 
be  seen  throughout  this  analysis  that  the  fact  that  the  recursive 
algorithm  is  a martingale  leads  to  many  advantageous  properties.  The 
matrix  case  of  the  recursive  algorithm  is  a simple  extension  of  the 
vector  case.  We  can  define  a single  output  as 


,.i,.T 

■ (h  ) x 


(3  32) 


where  x is  the  input  vector  and  the  Ji  are  the  columns  of  the 
matrix  filter.  For  the  multi-output  case,  we  get 


r.  1 .2  , n.T 

jr  * [h  h . . . h ] x 

i - HT  x (3.33) 


where  the  columns  of  the  matrix  filter  are  the  results  of  the  vector 
case . 

After  the  proof  of  convergence  of  the  general  recursive  algorithm 
has  been  established,  we  show  the  statistical  and  geometrical  interpre- 


tations involved  in  this  convergence  theorem.  These  geometrical 
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Interpretations  are  important  in  understanding  the  intuitive  arguments 
employed  in  the  analysis  of  the  complex  matrix  adaptive  filter. 

3.3  Convergence  Proof  of  Stochastic  Approximation 

Theorem  3.6  and  the  succeeding  proof  apply  to  any  recursive 
algorithm  that  satisfies  the  conditions  of  this  theorm. 

It  is  proposed  that  the  matrix  algorithm  of  the  form 


H 


n+1 


(3.34) 


where  R(x|H)  is  the  regression  function  and  y^  is  the  special  gain 
sequence,  or  the  vector  form  of  the  algorithm 


Vl  “ \ - yn  (3-35) 

be  used  as  the  recursive  algorithm  to  find  the  matrix  filter  weights. 

It  is  shown  in  this  section  that  if  any  algorithm  of  the  form  of  Equation 
(3.35)  satisfies  the  convergence  conditions  of  stochastic  approximation 
theory,  then  the  recursive  algorithm  will  converge  to  h^p  with 
probability  one.  The  matrix  form  [Equation  (3.34)]  follows  from  the 
vector  form  [Equation  (3.35)].  In  Equation  (3.34),  R(x|H)  can  either 
be  the  regression  function  derived  from  the  orthogonal  projection  lemma 
or  the  gradient  of  the  performance  measure.  Certain  restrictions  will 
be  placed  on  the  regression  function  to  guarantee  convergence  and  it 
will  be  shown  that  the  adaptive  algorithm  (vector  case)  satisfies  these 
conditions. 
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Theorem  3.6 

a sequence  of  numbers  such  that, 

> 0 

U « oo 

n 

2 „ 

y < ® . 

n 

Let  the  following  conditions  be  satisfied, 

B.  inf  E{(h  - ^pT)T  r(x(h)}  > 0 

e<l|h  - hoprll^  e>0  (3.39) 

C.  E{r(x|h)T  r(x|h)}  < d(l  + | |h  - hj | 2)  (3.40) 

for  all  h in  a bounded  set  and  d>0  . If  the  preceding  conditions 
(Al,  A2,  A3,  B,  C)  are  satisfied  then  the  sequence  h^  defined  by 
Equation  (3.35)  converges  with  probability  one  and  in  the  mean  square 
to  the  root  of  the  regression  function. 

It  is  to  be  noted  that  where  derivatives  exist,  the  regression 
function  can  be  replaced  by  the  gradient.  It  will  be  simpler  in  this 
proof  if  the  existence  of  the  gradient  is  assumed.  In  any  proofs  that 
follow,  the  gradient  VG  will  be  equivalent  to  the  regression  function 
jr(xjh)  and  they  will  be  used  interchangeably. 

t 

The  preceding  theorem  contains  the  sufficient  conditions  for 


(3.36) 

(3.37) 

(3.38) 


Let  y be 


Al.  y 

n 


A2.  Z 
n**l 

OO 

A3.  Z 
n=l 


convergence  of  any  stochastic  algorithm.  If  these  convergence 
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conditions  are  satisfied,  then  the  recursive  algorithm  [Equation  (3.35)] 
can  be  guaranteed  to  converge  to  the  optimum  solution  with  probability 
one  (Gladyshev,  49).  The  mean  square  convergence  i3  not  formally 
established  but  it  can  be  gleaned  from  the  analysis. 

The  following  proof,  patterned  after  that  of  Gladyshev  (49), 
shows  that  the  stochastic  algorithm  (vector  case)  converges  with  proba- 
bility one  and  in  the  mean  square  to  the  optimal  solution  if  it 
satisfies  the  previous  conditions  (Al,  A2,  A3,  B,  and  C) . This  proof 

applies  to  both  the  real  and  complex  variable  cases  [if  the  hermitian 

H T 

transpose  ( • ) is  used  instead  of  the  regular  transpose  ( * ) and  the 

regression  function  is  used  instead  of  the  gradient). 

PROOF 

Subtracting  both  sides  of  Equation  (3.35)  by  » gives 


^n+1  " ^OPT 


*n  “ h0PT  ‘ ynVGn 


(3.41) 


where  j:(xjh)  ■ 7G^  and  h^p^  " h^  • Squaring  Equation  (3.41),  gives 


^n+l  “ ^ (^n+l  " ^0) 


(h  - h_)T(h  - h_)  - 2y  (h  - tOT7G 
n - o — n n ~ n - o n 

+ y 2 7G  T 7G  . (3.42) 

n n n 


Taking  the  mathematical  expectation  for  a given  h^,  , h^  , we 

get: 


E{||h 


^n+1  O' 


hlf  h2  • • • v - i liiji  iioir 


- 2y  E{  (h  - h,JT  7G  } + u 2 E{7G  T 7G  } . (3.43) 

n-n-^Onn  nn 


w 
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where  | fh^  - hjl2  = (h^  - - IJq)  • From  Condition  C, 

Equation  (3  c 43)  becomes: 


*tllVi  - Sol 


h , h . . . h } < 

— 1 — l — n — 


- hoi  I2  - 2yn  E{  (h^  - h^1  VGn  + yn2  d[l  + | |h  - HQ||2]  (3  44) 


Using  Condition  B,  Equation  (3.44)  is  reduced  to 


E{HVi  - So' 


h , h . . . h } £ 

— 1 2 Ti 


Ih.-ioll2  u + on2<i)  -<-%2 


(3.45) 


If  we  define 


00  00  00 

| I h -hjl2  n (1  + y 2 d)  + Z 2dy  2 n (1  + y d)  (3.46) 
~n  k=n  * k=n  k m=k+l 


then 


n+1 


- hi 
-o' 


n 

k=n+l 


(1  + y,  d)  + 


00 

E 

k=n+l 


2dy, 


n (1 

m=k+l 


y d).(3  47) 

m 


Taking  the  conditional  expectation  ior  given  h^,  h^  . . . h^  , and 
using  Equations  (3.45),  (3.46),  (3.47),  we  get: 
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EtfW  i2  . 


V - E<ll!W-V 


“2 


. h } 


n (1  + y 2 d)  + Z 2dy  2 n (1  + y 2d)  < 

k=n+l  k=n+l  m=k+l 


[||h  - hJ|2  (1  + dy  2)  + 2y  2 d]  n (1  + y 2d) 

~n  -V  n n k=n+1 


+ Z 2dy,  2 n (1  + y 2 d)  = 3 

, k . , , m n 

k=n+l  m=k+l 


(3.48) 


or,  more  compactly. 


e(B 


n+l 


. h } < B . 

— n — n 


(3  49) 


Since  R = f (h, , h„  . . . h ) , then  if  we  take  conditional 
n —1—2  — n 

expectation  for  given  &2  . . . &n  of  both  sides  of  Equation  (3  49), 

and  using  definitions  from  the  conditional  expectation  identities  in 
Appendix  C,  we  obtain 


E^n+1  ei’  62 


Sn> 


(3  50) 


The  inequality  in  Equation  (3.49)  shows  that  the  Bn's  are  a 
semimartingale,  where: 


E(B  , , } < E(B  } < ...  < E{g  } < <*>  , (3.51) 

n+l  — n — — 1 

so  that  according  to  the  theory  of  martingales  (Doob,  38)  the  sequence 
B converges  with  probability  one,  and  by  virtue  of  Equations  (3  45) 


jL 
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and  (3.46),  the  sequence  (h^  - h^)  also  converges  with  probability  one 
to  some  random  number  5 . The  fact  that  the  sequence  of  filters  is  a 
martingale  will  be  used  not  only  to  prove  convergence  of  the  recursive 
algorithm  [Equation  (3.34)]  which  is  used  to  calculate  the  matrix 
filter  but  also  to  show  that  the  stochastic  control  system,  represented 
by  the  recursive  algorithm,  is  stable  in  the  sense  of  stochastic 
stability.  A list  of  martingale  properties  appears  in  Appendix  C 

To  complete  the  proof  of  the  theorem,  it  remains  to  be  shown  that 
PROB  (5=0)  = 1 . 

It  can  be  seen  from  Equations  (3.51)  and  (3.45)  and  the  fact  that 

2 

y >0  that  the  sequence  E { I [ h - h„ I I } is  bounded.  Let  us  take  the 
n 1 n -%)' 

mathematical  expectation  on  both  sides  of  inequality  Equation  (3.44): 


E(|  lh  " hJ  I2)  < E{  | |h  -hJ|2}"2u  E (h  - h )T  VG 

n+1  —O'  ' — 1 ' — n -^O 1 1 n— n^O  n 


+ Un2  d[l  + E|  |h  - hjl2]  . 


(3.52) 


Adding  the  first  n inequalities  together,  we  have  by  deduction: 


E{||hn^1  “ hjl2}  i E{||h  - hjl2}  + I dpk2(i  + Ellhfc  - holi ") 

k=l 


I 2U  EtO^-V  VG) 
k=l 


(3.53) 


Since  E{ | | h - lv.ll  } is  bounded  and  Condition  A3  is  fulfilled. 


using  Equation  (3.52),  it  follows  that: 


Z y.  E{  (h.  - lv)  VG  } < 

. , k “Tc  — 0 n 

k=l 


(3  54) 


jL 


Using  Condition  A2  and  noting  that  from  Condition  B, 


inf  E{(h-h())TVG}  > 0 , (3.55) 

e<ll>>  - Sol  l<j 


we  deduce  from  Equation  (3.54)  that 

lT 

( (h  - h^)  VG  } -►  0 with  probability  one  for  some 

sequence  n^  . 

T 

Using  the  fact  that  (h  - h_)  VG  -*■  0 with  probability  one, 

n ~ u n 

the  fact  that  the  regression  function  has  a unique  root  and  the  fact 
that  (h^  - h^)  ■+  E,  with  probability  one  from  the  first  half  of  the 

proof,  we  conclude  that  5=0  with  probability  one.  Therefore,  the 
algorithm 

Vl  “ 4 - "n  ‘ 5G„  <3'56) 


converges  with  probability  one. 


PR0B{ lim  (h^  - hg)  * 0}  - 1 (3.57) 

n-x* 


as  well  as  in  the  mean  square  sense, 
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3 4 Statistical  and  Geometrical  Significance  of  the  Convergence 
Criterion 

Recall  from  the  proof  of  the  recursion  relation  for  the  adaptive 

filter  the  conditions  (Al,  A2,  A3,  B,  C)  imposed  on  the  properties  of 

the  gain  sequence,  y , as  well  as  on  the  behavior  cf  the  regression 

n 

function  r(x|h)  . These  conditions  not  only  guarantee  convergence  of 

the  recursive  algorithm  with  both  probability  one  and  in  the  mean  square 

sense  but  boasts  important  statistical  and  geometrical  concepts. 

The  condition  that  y >0  is  to  assure  that  the  corrections,  on 

n 

the  average,  are  to  be  made  in  the  direction  toward  the  minimum 

CO 

2 

The  condition  that  E y < 00  is  to  account  for  the  accumu- 

i n 

n=l 

lative  effect  of  the  error  in  measurement.  If  random  noise  enters  the 
measurement  at  each  iteration  step,  then  this  condition  assures  that  the 
random  measurement  error  approaches  zero  as  the  number  of  iterations 
becomes  large.  This  condition  also  implies  the  condition  that 


lim  y 
n-x” 


n 


0 


(3,59) 


We  can  see  that  if  we  let  y^  approach  zero  in  this  manner  then  we  can 
attain  arbitrary  accuracy  in  our  movement  to  the  minimum  of  the  regres- 
sion function. 

The  above  conditions  assure  that  h converges  on  some  value 

— n 

00 

h . The  condition  E y -+  00  assures  that  value,  h , converges 
-oo  , n 

n»l 

to  h___  If  h approaches  any  value  other  than  bv.  , the  total 
~0PT  — n „ —Ur  i 

00 

correction  effect  E y r (x[h)  is  infinite. 

, n — n — — 
n»l 

The  above  conditions  guarantee  that  if  the  sequence  y^ 
satisfies  conditions  Al,  A2  and  A3,  the  total  correction  effect  of  the 


I 
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sequence  Un  allows  the  recursive  algorithm  to  approach  h^pT  with 
infinite  correction,  and  as  pn  gets  smaller  we  can  approach  the 
minimum  with  arbitrary  accuracy. 

The  condition  that 

inf  Et(h  ' iL,)1  r(x|li>}  1 0 (3  60) 

=<Mi- «Soll<i 

for  e > 0 determines  the  behavior  of  the  surface  E^frOcJ h) } = 0 

close  to  the  zero  of  the  regression  function.  If  the  error  criterion 
does  have  a unique  minimum,  the  above  condition  is  satisfied. 

The  condition  that 

E{jrT(x|h)  _r  (x|.h) } _<  d(l  + ) |h  - 1^ | | 2 ) (3  61) 

for  d > 0 requires  that  the  mathematical  expectation  of  the  quadratic 
forms 

E{rT(x|h)  (3,62) 

increase,  as  h increases,  no  faster  than  a quadratic  function  of  the 
weights.  This  condition  guarantees  that  the  variance  of  the  regression 


function  is  bounded. 


11 


CHAPTER  IV 


ADAPTIVE  FILTER  DERIVATION 


4 0 Introduction 

Using  the  results  from  the  previous  chapter,  the  optimum  matrix 
filter  is  derived  and  its  convergence  established.  The  sequence  of 
proofs  that  show  the  adaptive  algorithm  satisfies  the  convergence 
conditions  of  stochastic  approximation  appear  in  Section  4.3.  While  the 
satisfaction  of  the  convergence  conditions  guarantees  convergence  in  the 
mathematical  sense,  the  main  thrust  of  this  research  has  been  to  devise 
a system  that  obtains  fast  initial  (in  less  than  50  iterations  of  the 
recursive  algorithm)  increase  in  S/N  (fast  initial  convergence) . The 
satisfaction  of  the  convergence  conditions  of  stochastic  approximation 
then  provides  the  basis  to  expect  ultimate  convergence  if  the  recursive 
algorithm  were  run  long  enough.  This  fact,  although  mathematically 
elegant,  is  not  very  useful  for  practical  applications  of  the  adaptive 
processor. 

We  show  that  under  the  weak  signal  strong  interference  ,»  3umption, 
the  present  adaptive  algorithm  solution  corresponds  to  the  diagonal  na- 
tion of  the  cross  spectral  density  matrix.  The  proof  of  convergence  of 
the  adaptive  algorithm  is  done  for  both  the  real  and  complex  forms.  It 
is  shown  rigorously  that  for  the  real  variable  case,  the  convergence 
conditions  for  stochastic  approximation  are  satisfied.  In  the  complex 
case,  the  five  l divergence  conditions  can  be  shown  to  be  satisfied  by 
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mathematical  analysis.  The  satisfaction  of  the  fifth  condition  is  also 
clear  from  geometrical  and  intuitive  reasoning. 

In  the  final  section,  an  expression  is  derived  for  the  minimum 
mean  square  error  and  it  is  shown  that  in  the  case  of  the  matrix  filter, 
a decomposition  of  the  mean  square  error  into  the  contributions  from 
each  beam  is  more  meaningful  than  considering  the  total  mean  square 
error  due  to  all  beams. 

4 . 1 Derivation  of  Complex  Matrix  Filter 

In  order  to  avoid  embarrassing  difficulties  in  differentiation 
of  complex  functions,  the  orthogonal  projection  lemma  is  used  for 
obtaining  expressions  for  linear  approximation  to  minimum  mean  square 
estimation  of  random  processes.  The  justification  for  the  use  of  the 
orthogonal  projection  lemma  comes  from  Theorem  3.4.  In  this  theorem  the 
basic  fact  used  is  that  the  necessary  and  sufficient  condition  for  the 
optimum  linear  filter  for  minimum  mean  square  error  is  the  satisfaction 
of  the  orthogonal  projection  lemma. 

The  following  development  obtains  the  frequency  domain  matrix 
Wlener-Hopf  equation  for  a multi-input,  multi-output  system  by  use  of 
the  orthogonality  conditions  for  stochastic  processes.  From  the 
orthogonality  condition  of  Theorem  3.4,  the  optimum  linear  matrix 
filter  can  be  derived.  From  considerations  of  the  optimum  matrix 
filter  and  the  orthogonality  conditions,  we  can  formulate  an  equation 
for  the  regression  function  to  be  used  in  the  stochastic  approximation 
algorithm.  This  regression  function  is  used  in  the  complex  stochastic 
approximation  algorithm  to  perform  the  same  function  as  the  gradient  in 
determining  the  adaptive  matrix  filter  The  following  theorem  provides 


* 


! 
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the  regression  function  needed  in  the  stochastic  approximation  algorithm 
which  calculates  the  matrix  filter. 

Theorem  4.1 

A necessary  and  sufficient  condition  for  the  n x n matrix 
filter,  H(f)  , to  minimize  the  mean  square  error 

E (eHe ) = E[(d  - HTx)H(d  - HTx)]  (4.1) 

is  that  the  orthogonality  conditions 

E{(d-HTx)xH}  =0  (4.2) 

or,  equivalently. 


E{x(d  - HTx)H}  = 0 


(4  3) 


are  satisfied. 

It  is  only  needed  to  state  the  results  from  Theorem  3.4  to  prove 
the  above  theorem.  The  optimum  filter  can  now  be  derived  from  the 
orthogonality  conditions  of  Theorem  4.1.  From  Equation  (4.2),  we  have: 


E{exH}  = E( (d  - HTx)  xH)  = 0 , 


G 

22L 


0 , 


(4.4) 


I 

I 


jL 
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From  Equation  (4.3),  we  have: 


E{x£H}  = E{x(d  - HTx)H} 


E(xdH)  - E(xxH)H 


G . - G H = 0 , 

xd  xx 


and 


* 

H 


[G  ] 
xx 


-1 


(4.5) 


Cases  where  the  gradient  of  the  mean  square  error  exists,  the 
orthogonality  condition  is  equivalent  to  the  gradient.  Hence,  the 
estimated  orthogonality  condition  can  be  used  instead  of  the  estimated 
gradient  in  the  adaptive  algorithm.  The  orthogonal  projection  lemma 
suggests  a stochastic  approximation  algorithm  to  find  the  adaptive 
matrix  filter  of  the  form 


n+1 


H*  + y [G  - G H*] 
n n xd  xx  n 


H + u R(x|H) 
n n — 


( 4 . 6 • 


Equation  (4.6)  is  in  state  variable  form  and  this  implementation 
appears  in  Figure  8.  It  is  to  be  noted  that  the  optimum  filter  depends 
only  on  the  power  spectral  densities  of  the  input  signal  and  noise 
Due  to  the  mean  square  error  criterion,  any  two  realizations  of  input 
random  processes  with  the  same  power  spectral  density  will  require  the 
same  optimum  filters. 
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It  is  true  that  all  processes,  Gaussian  or  non-Gaussian , with  the 
same  auto-power  spectral  density  and  cross-power  spectral  density 
matrices  lead  to  the  same  processor  and  the  same  mean  square  error  if 
the  processing  is  required  to  be  linear  (Rhodes,  100;  Van  Trees,  125). 

In  the  statistical  case,  while  the  performance  measure  has  a 
unique  minimum,  the  random  process  that  attains  this  minimum  may  be 
different  for  each  realization  of  the  random  process  represented  by  the 
filter  weights  and  only  the  long  term  statistical  properties  determine 
the  optimum  filter  when  the  filter  is  defined  to  be  a linear  function  of 
the  inputs.  It  is  in  the  case  of  Gaussian  processes  that  no  nonlinear 
filter  is  superior  to  the  best  linear  filter  in  the  mean  square  error 
sense. 


4.2  Decoupling  Concept  of  Adaptive  Filter 

According  to  the  orthogonality  condition,  the  sufficient  condition 
for  minimization  of  the  mean  square  error  is 

G.  (f)  - H(f )T  G (f)  - 0 (4.7) 

dx  xx 

which  Implies 

Gdx<f)  • V£>  • H<f)Tcxx(f>  ' <4'8> 

Now,  if  one  assumes  that  the  cross-spectrum  matrix  between  desired 
signal,  ci  , and  input  vector,  x , is 


Gdx(f) 


D.  [G.  ] 

lag  d1*i 


(4.9) 
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then  a typical  off-diagonal  term  is: 


G - 0 for  i ^ j . (4.10) 

yiXj 

This  means  that  under  so-called  weak  signal  strong  interference 
assumption,  minimization  of  mean  square  error  is  essentially  the  same 
as  diagonalization  of  cross-spectral  density  matrix.  It  is  under  this 
assumption  that  the  performance  of  the  proposed  adaptive  algorithm  is 
to  be  judged.  The  matrix  representation  for  the  decoupling  concept 
appears  in  Figure  9. 

4.3  Proof  of  Convergence  of  Decoupling  Matrix  Filter 

This  section  shows  that  the  adaptive  decoupler  algorithm  (the 
vector  case)  satisfies  the  convergence  conditions  required  by  stochastic 
approximation  and  therefore  converges  as  the  number  of  iterations  become 
large,  to  the  optimum  decoupling  matrix  filter  with  probability  one  and 
in  the  mean  square. 

The  algorithm 


* 

h . . 
— n+1 


- h*  - u r(x  | h ) 
n n n 


(4.11) 


converges  if  the  following  conditions  are  satisfied: 


Al.  p > 0 ; A2.  Ip  - A3.  Ip  <°°  (4.12) 

n , n , n 

n-1  n-1 


B.  inf  E{(h  - l^r  r(x|h)}  > 0 

e<|  |_h  - hg||<—  , e>0  in  the  neighborhood  of  h^  . (4.13) 


4 
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C.  E{r(x|h)T  r(x|h)}  < d(l  + | |h  - hj  |2)  , d>0  . (4.14) 

A special  Infinite  sequence  of  the  fora 

~ ; 0.5  < i < 1.0  (4.15) 

n1 

Is  proposed  as  the  gain  sequence  used  in  the  recursive  algorithm  used  to 

obtain  the  matrix  filter.  The  term  A represents  the  maximum 

max 

eigenvalue  of  the  Input  covariance  matrix  G^ff)  . The  derivation  of 
the  gain  constant  (2.0/A  ) is  contained  in  Section  5.1.1. 

The  following  proofs  use  the  theory  of  infinite  series  to 
establish  that  the  special  infinite  gain  sequence  used  in  the  stochastic 
approximation  algorithm  satisfies  Conditions  Al,  A2  and  A3.  These 
conditions  are  reproduced  below. 


o 

A 

e 

3 

(4.16) 

00  / . 

k2‘  r (a 

1 

1 

«»  for 

0.5  < 

i < 1.0 

(4.17) 

n»l  l max 

n 

• r/  * 

"i  2 

• 

“•  \ r t1- 

■*1 

< • for 

0.5  < 

i < 1.0 

(4.18) 

n*l  [_  ( max  ( 

“■  (r1- 

i * 

i 

0 for 

0.5  < 

i < 1.0 

(4.19) 

Br*°*  l 

n 

Condition  A3*  is  added  to  the  list  to  illustrate  how  the 
sequence  approaches  zero.  This  result  is  important  in  assessing  the 
effect  of  the  infinite  sequence,  WQ  » on  the  convergence  rate  of  the 
recursive  algorithm. 


1 
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The  following  results  are  used  in  the  proofs  of  Conditions  Al,  A2, 
A3  and  A3'. 

Theorem  4.2 

A series  Ea  of  positive  terms  Is  convergent  if  and  only  if, 

v 

its  partial  sums,  S - E a , are  bounded.  If  the  partial  sums  are 

k-0 

unbounded,  then  the  series  is  divergent  to  the  value  + « . 

Theorem  4.3 

The  sequence  aQ  is  convergent  and  converges  to  a limit  L if, 
for  each  e > 0 , there  is  a positive  Integer  N such  that  | — L | < e 
for  all  n > 1 . 

Proof  of  Condition  Al 

Since  the  eigenvalues  of  a complex  Hermetlan  matrix  [the  input 
covariance  matrix,  G^f)]  are  real  (Wilkinson,  134),  and  n is 
greater  than  zero,  we  can  guarantee  satisfaction  of  Condition  Al. 

00 

The  series  E is  designated  as  the  harmonic  series  with 

n-1  n 

the  exponent  1 . For  i < 1 , its  partial  sums  are  greater  than  those 

of  the  series  E—  and  by  comparison  with  the  series  E—  , the  harmonic 
n n 

series  with  exponent  1 , will  be  shown  to  be  divergent.  The  fact  that 

the  harmonic  series  E — is  divergent  (Knopp,  69)  is  used  in  the 

n n 

following  comparison  test. 

Proof  of  Condition  A2 

k 

Let  us  observe  thst  n <,  n for  k < 1 . Then,  considering  the 
2 th  partial  sum  we  gat 


'‘4 


p 


S2n  - 1 + + (±k  + ±k)  + (±k  + ...  +jk) 
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(2tl~1  + l)k 


+ . . . + 


(2”)k 


1 1 + J + (j  + j)  + (j  + . . . + i>  + . . . 


(2n_1  + 1 


+ . . . + — 


± i+M+?>  + <?+-  • -+?>  + 


1 + 2 + 2 + 2+  ‘ * * + 2 


+ (-—+.  . 

2n 


* + ^ 
2n 


- !♦}  . 


(4.20) 


Hence,  the  pertlel  subs  ere  unbounded  end  therefore,  the  series 
diverges  (Theorem  4.2). 

By  e slmller  method.  It  cen  be  shown  thet,  for  1 > 1 , the 
heraonlc  series  with  exponent  1 Is  convergent. 


Proof  of  Condition  A3 

Condition  A3  is  obvious  once  it  is  recognised  thet  the  series 
cen  be  written  in  the  form 


2: 

n*l  n' 


w 

— m £ -i- 

21  , J * 

n-1  nJ 


J - 2i 


(4.21) 


( ) 


Since  it  is  known  thet  the  series  converges  for  J > 1 , then  it  only 
remelns  to  bound  J > 0.5  to  setisfy  Condition  A3. 


I 
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The  proof  of  Condition  A3'  uses  the  results  of  Theorem  4.3. 
Proof  of  Condition  A3' 

If  e is  sn  arbitrarily  small  positive  number  and  N is  any 


2 1 th 

positive  integer  greater  than  — , the  m term  of  the 

r 2 1 1 “XJ 

sequence  is  | | — . If  a > N , then. 


which  shows  that  ja^  - 0|  < e for  all  m > N and,  therefore,  the 
sequence  {a^}  converges  to  zero.  The  proof  has  been  for  1-1  but 
it  holds  for  all  i such  that  0.5  < 1 < 1 . 


The  preceding  proofs  have  shown  that  the  specially  derived  gain 
sequence  does  satisfy  the  convergence  Conditions  (Al,  A2,  A3  and  A3') 
on  the  Infinite  sequence  . In  the  following  analysis,  results 
obtained  in  Tuteur  and  Chang  (123)  are  used  to  show  the  algorithm 
satisfies  Conditions  B and  C.  He  know  from  results  of  the  orthogonal 
projection  lemma  that  the  mean  square  error,  e(h)  , has  a unique 
minimum.  He  can  take  a derivative  of  the  mean  square  error  and  see  that 
for  every  1 we  get 


^ > 0 for  \ > h10 

- 0 for  hA  - h1Q 

< 0 for  hA  < h1Q  . (4.23) 


p 
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Equation  (4.14)  Is  just  a formal  statement  Indicating  the  behavior  of 
a convex  function.  Consequently,  with  the  regression  function  equal  to 
the  gradient  in  the  real  variable  case,  we  see  that 


" *k>>  ^ - 0 


for  all  1 


(4.24) 


and  Condition  B is  satisfied. 

2 

If  we  assume  each  x (input  signal)  satisfies  E{x  } < • , then 
the  proof  that  the  algorithm  satisfies  Condition  C is  as  follows: 


Proof 


where 


If  Ve(h)  - Ve|h  + J(h  - hQ) 


J - 2xx 


t 

K 


(4.25) 


then. 


VeV  - ^>T|b0<^|h0)+*^|h<)6a(h-V 

+ 4(h  - h^)  xxT  xxT (h  - h^)  . 


(4.26) 


If  we  note  that  x x,  is  the  sum  of  the  power  in  the  input,  then 


%'a’l  ■ • 


(4.27) 


I ( 


It  la  obvious  that  the  first  and  second  terms  in  Equation  (4.25)  are 
sero  since,  by  definition. 
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r<ilV  ' h “ 0 • (4’28> 

Using  the  result  from  Beckenbach  and  Bellman  (13)  that 

xT  A x < Ajnax  | | x|  | 2 , gives 

of  A 

E{(Ve)T(Ve)>  - 4<h  - h„)T  "WWi"  V 

< - VT<i  - V <4-29) 

where  X is  the  largest  eigenvalue  of  matrix  G . Then  from 

max  xx 

Equation  (4.29),  we  see  that 

E(VeT7e}  < c^l  + (h  - h^Qi  - h^l  (4.30) 

and  Condition  C is  satisfied. 

We  have  shown  that  for  the  real  variable  vector  case,  the 
proposed  decoupling  algorithm  which  derives  the  matrix  filter  satisfies 
the  convergence  conditions  for  stochastic  approximation.  The  only 
difference  in  the  proof  for  the  complex  case  is  that  we  cannot  talk 
about  gradients  in  the  same  sense  as  in  the  real  variable  case.  It  is 
shown  that  the  complex  case  satisfies  the  five  convergence  conditions 
in  strict  mathematical  fashion.  The  fifth  condition  is  imbedded  in 
classical  analysis's  definition  of  a convex  function  and  it  is  diffi- 
cult to  prove  in  this  context.  It  is  obvious  that  the  performance 
measure  used  is  no  more  than  quadratic  in  the  filter  coefficients,  H , 
and  therefore  must  satisfy  the  final  condition.  Since  we  can  from 
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Theorem  3.4  guarantee  a unique  minimum,  the  reasoning  behind  this 
condition  Is  valid. 

Recall  from  stochastic  approximation  [Equations  (4.12),  (4.13), 

(4.14)]  the  conditions  which  guarantee  convergence  of  the  iterative 

algorithm.  The  complex  case  can  be  shown  to  converge  In  the  same  sense 

as  the  real  case.  The  convergence  conditions  on  the  variable  gain, 

y , are  satisfied  by  the  choice 
n 

0.5  < i < 1.0  . (4.31) 

This  choice  of  y will  be  derived  in  Section  5.1.1  from 
n 

considerations  of  the  Idealized  case  of  the  recursive  algorithm  and  the 
convergence  conditions  imbedded  in  the  stochastic  approximation 
algorithm. 

Since  the  performance  measure  has  been  shown  to  possess  a unique 
minimum  from  considerations  of  the  orthogonal  projection  lemma,  then  we 
know  that  Equation  (4.13)  is  satisfied. 

The  condition  that  the  mathematical  expectation  of  the  norm  of 
the  regression  function  increases,  as  h Increases,  no  faster  than  a 
quadratic  function  of  the  filter  coefficients,  h , can  be  seen  to  be 
satisfied  from  both  intuitive  and  geometrical  considerations.  It  is 
obvious  from  the  performance  functional  itself,  which  is  a quadratic 
function  of  the  filter  coefficients,  ti  , that  the  surface  of  the 
performance  functional  cannot  Increase  faster  than  a quadratic  function 
of  the  weights.  This  statement  is  equivalent  to  saying  that  the 
performance  measure  is  convex  near  the  minimum.  Since  we  know  that  the 
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error  surface  is  bounded  above  by  a quadratic  and  it  has  a unique 
minimum,  it  must  satisfy  the  final  condition.  This  reasoning  follows 
that  of  Friedman  (46)  who  showed  that,  if  the  given  function  can  be 
bounded  by  some  known  function,  then  this  condition  is  satisfied. 

The  following  is  the  proof  of  Condition  C [Equation  (4.14)]  for 
the  complex  form  of  the  stochastic  recursive  algorithm  which  calculates 
the  matrix  filter  coefficients.  The  proof  is  done  for  the  vector  case 
but  can  be  easily  extended  for  each  vector  in  the  matrix  filter. 

Proof 

The  stochastic  approximation  algorithm  can  be  written  as 


H 


n+1 


< + - **"> 


(4.32) 


where  X ■ H x . 

If  we  define 


(4.33) 


then  Equation  (4.32)  becomes 


n+1 


[I  - y x x H]  Z* 
n — tr-n  n 


(4.34) 


If  we  define  the  regression  function  as 


H * 


(4.35) 
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then  we  get 


r<2.l£)}  “ E< zuT <xltxikH) H(xt  x^H) z*  } 


T H.H,  H.,  * 

■ z El (x  x ) (x  x )}  z 


(4.36) 


Since  we  know  that 


zT  Az  » X | | z | 
— — max  1 ' 


(4.37) 


from  Beckenbach  and  Bellman  (13)  where 


A - E{ (x  x H)H(x  x H) } 
13  n tm 


(4.38) 


and  since  A Is  bounded  (a  physical  process)  and  j |zj  | Is  bounded, 

then  to  satisfy  Condition  C we  need  only  choose  d to  be  < X of  A. 

— max 

The  proof  of  Condition  C completes  the  proofs  that  show  that  the 
complex  form  of  the  recursive  algorithm  used  to  calculate  the  matrix 
adaptive  filter  satisfies  the  convergence  conditions  of  stochastic 
approximation. 


Mean  Square  Error 


We  can  compute  the  mean  square  error  as 


E{eHc) 


* E{(d  - £>  l<i  - x)) 

- E{(d  - HTx)H(d  - HTx)} 

- E{dHd  - 2(HTx)Hd  + (HTx)H(HTx)> 

- E{dHd  - 2(2) Hd+  (y)H(y)}  . 


(4.39) 
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If  we  look  at  any  of  the  three  products  in  Equation  (4.29),  then 
we  can  see  that  each  expression  is  a sum  of  terms  in  which  each  term 
depends  on  a particular  beam.  If  we  take  the  <1  <i  product,  we  see  that 


dIdl  + d2d2  + d3d3 


(4.40) 


We  can  see  from  Equation  (4.39)  that  the  mean  square  error 

becomes 


E{eHe> 


E{  Z |e. | } . 

i-1  1 


(4.41) 


The  mean  square  error  is  seen  from  Equation  (4.31)  to  be  a sum  of 
terms.  With  this  form  of  the  mean  square  error,  it  becomes  desirable 
to  look  at  each  individual  component  of  the  minimum  mean  square  error 
separately.  In  the  case  derived  in  this  research,  where  there  is  more 
than  one  output,  it  is  very  instructive  to  see  how  the  mean  square  error 
is  being  reduced  in  each  output.  For  instance,  if  the  power  in  one  beam 
is  one  hundred  times  the  power  in  another  beam,  then  the  interfering 
noise  could  be  removed  completely  from  the  low  power  beam  and  it  would 
not  be  detectable  in  the  mean  square  error  because  a small  1Z  or  less 
error  in  the  filter  coefficients  that  affect  the  mean  square  error  in 
the  high  power  beam  would  completely  mask  the  mean  square  error  reduc- 
tion in  the  low  power  beam. 


CHAPTER  V 


DYNAMIC  PROPERTIES  OF  STOCHASTIC  FILTER 
5 0 Introduction 

This  chapter  contains  the  analysis  of  the  dynamic  properties  of 
the  matrix  decoupling  filter  and  an  analysis  of  stochastic  stability  as 
It  applies  to  the  adaptive  algorithm.  An  optimum  gain  Is  derived  to 
obtain  fast  convergence  properties.  The  use  of  this  optimum  gain  has 
been  mentioned  previously  and  its  properties  are  discussed  in  greater 
detail  when  the  computer  simulation  results  of  Chapter  VI  are  given. 

It  is  shown  that  the  stochastic  Lyapunov  functions  possess  the 
martingale  property.  This  was  first  recognized  by  Bucy  (21)  and 
Kushner  (73).  The  recognition  that  the  stochastic  Lyapunov  functions 
are  martingales  should  provide  impetus  for  future  results  in  the  area 
of  automatic  control  systems  and  adaptive  array  processing.  With  a 
readily  available  analysis  for  the  stability  of  stochastic  control 
systems,  control  systems  designers  will  find  stochastic  systems  tech- 
niques easier  to  use.  A more  firm  connection  between  adaptive  array 
processing  and  the  vast  literature  of  control  systems  is  advantageous. 

The  last  section  contains  the  analysis  that  unifies  all  the  LMS 
type  algorithms  into  the  framework  of  stochastic  approximation.  It  is 
shown  that  algorithms  which  use  a constant  y when  the  input  signals 
are  random  processes  cannot,  in  general,  obtain  the  advantages  inherent 
in  the  stochastic  approximation  algorithm. 
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5.1  Dynamic  Properties  of  Stochastic  Adaptive  Flltei 

It  is  shown  in  this  chapter,  from  considerations  of  the  idealized 
case  for  the  decoupling  algorithm,  that  the  maximum  gain  constant  can  be 
determined  for  use  in  the  stochastic  approximation  algorithm.  This 
particular  gain  constant  can  be  used  as  the  starting  value  for  a special 
sequence  in  the  context  of  stochastic  approximation  to  greatly  Increase 
the  rate  at  which  the  matrix  adaptive  filter  converges.  The  purpose  of 
this  analysis,  stated  in  the  introduction,  was  to  design  an  algorithm 
which  operates  in  real  time  and  yet  increases  the  output  signal-to-noise 
ratio  so  as  to  increase  the  probability  of  detection.  With  this  special 
gain  sequence,  these  objectives  are  satisfied.  It  is  shown  that  this 
optimum  gain  is  the  largest  allowable  in  the  idealized  case  and  repre- 
sents the  borderline  of  the  stability  region  in  the  context  of  state 
variable  control  system  analysis.  If,  however,  this  gain  were  to  be 
used  in  a deterministic  algorithm,  the  misadjustment  of  the  weights  and 
the  noisiness  of  the  system  would  be  extremely  large.  Of  course,  this 
is  not  the  case  with  stochastic  approximation  because  this  mathematical 
technique  takes  into  account  the  stochastic  nature  of  the  system  and  Is 
insensitive  to  this  type  of  misadjustment. 

5.1.1  Optimum  Gain  Calculation  To  derive  the  special  gain 
sequence  used  in  the  stochastic  approximation  algorithm,  the  convergence 
properties  of  the  idealized  form  of  the  stochastic  algorithm  are  explored 
If  the  recursive  algorithm  is  rearranged  into  the  following  form: 


n+1 


- [I  - yG  ] H + yG  , 
1 xxJ  n xd 


(5.1) 
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then  a substitution  of  the  form 


n+1 


* -1 
H - G G , 
n+1  xx  xd 


(5.2) 


reduces  Equation  (5.1)  to 


* * 

H . . “ Z ^ + G G j 

n+1  n+1  xx  xd 


*=  [I  - yG  ] [Z*  + G _1  G ] + yG 
XX  n xx  xd 


xd 


[I  - yG  ] Z*  + [I  - yG  ] G _1  G , + yG  , . (5.3) 
xx  n xx  xx  xd  xd 


Simplifying,  we  have: 


n+1 


[I  - yG  ] Z 
xx  n 


(5.4) 


Equation  (5.1)  neglects  the  stochastic  nature  of  the  algorithm 

because  it  assumes  that  the  cross  spectral  density  matrix  G is 

xx 

either  given  or  can  be  computed  exactly  from  input  data.  This  assump- 
tion assumes  importance  when  the  positive  definiteness  of  the  cross 
spectral  density  matrix  G ^ is  required.  The  above  equation  also 
assumes  a constant  gain  instead  of  a variable  gain. 

Let  U be  an  unitary  transformation  which  diagonalizes  the  power 
spectral  density  matrix  , i.e., 


UH  G U 
xx 


A , 


(5.5) 
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(5.6) 


The  unitary  transformation  U exists  because  is  a positive 

definite  Hermitian  matrix.  The  algorithm  becomes: 


H * 

U Zn+lU 


UH[I  - pG  ] UUH  Z*  U , 
xx  n 


[UH  IU  - pUH  G U]  Q* 
xx  n 


[I  - pA]  Qn  . 


* H * 

Vl  “ U Zr*lU  • 


The  convergence  of  Equation  (5.9)  can  be  easily  established  by 

* * 
examining  the  sequence  of  solutions  of  Qn+^  • If  we  assume  that  Qq 

is  the  initial  condition,  then  the  sequence  of  solutions  is 


Q*  - [I  - pA]  Q*  , 


Q*  - II  - pA]  q!  - H - pA]2  Q* 


Cl  ‘ l1  ■ yA]n+1  Q* 


The  limit 


* 

lim  Qn+1  - 0 if  and  only  if  (5.14) 

n+1  + °° 

| | [I  - uA] | | < 1 , (5.15) 

where  the  expression  | | • | I is  any  suitably  defined  norm. 

Since  it  has  already  been  assumed  that  y > 0 in  the  derivation 

of  the  stochastic  approximation  algorithm,  the  only  condition  on  y for 

satisfaction  of  the  norm  inequality  is  that  y X < 2 , 

max 

0 < y < . (5.16) 

max 

Since 


Ci  ■ “Ci""  <5-i7> 

and 

lia  Qn+1  " 0 • (5.18) 

n+1  + 00 

than 


- 0 . (5.19) 

n+1  + *> 

Equation  (5.19)  implies  that  the  actual  filter  converges  to  the  optimum 
filter,  that  Is, 


r 
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llm  Hu+i 
n+1  -*■  00 


G “1  G . 
xx  xd 


(5.20) 


if  p satisfies 


0 < p < 


\ * 

max 


(5.21) 


where  ^max  the  maximum  eigenvalue  of  the  input  power  spectral 
density  matrix. 

The  stochastic  version  of  the  algorithm  [Equation  (5.4)]  is: 


,*  ,x  H,  ,* 

Zn+1  11  ' “ Vn  1 Zn  ' 


(5.22) 


. . x , , but 
— n-1 


Z is  a function  of  previous  input  data  x..,  x,  , x. , 
not  the  current  input  x , hence,  if  each  new  set  of  input  data 


represents  statistically  Independent  data,  is  statistically 

independent  of  present  data  and  the  expected  value  of  Equation  (5.22)  is: 


E<Vi> 


E[ I - p xx  H]  E{Z*} 
“ n 


- [I  - pG^]  E(Zn)  . 


(5.23) 


o 


Equation  (5.23)  shows  that  the  expected  value  of  the  filter  is 
governed  by  the  same  algorithm  as  the  idealized  non-stochastic  algorithm 
when  it  is  assumed  that  successive  input  samples  are  statistically 
Independent.  The  same  is  true  if  the  weakened  hypothesis  that  the 
successive  input  data  are  uncorrelated  is  used.  The  expression  for  the 
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second  order  statistics  of  Zr+1  is  more  difficult  to  obtain  and  it 
has  not  been  computed  since  it  is  not  needed  in  this  analysis. 

The  rate  of  convergence  can  be  determined  from  Equation  (5.13): 

/ 

<Cfl  - < ■ I1  - wAl”!!  - uA  - I)  qj 

- - [I  - |lA]°  yA  Q*  . (5.24) 

In  the  case  when  y A^  « 1 , the  rate  of  convergence  for  small  n in 

each  independent  loop  is  proportional  to  y A^  , where  A^  is  equal  to 
the  decoupled  power  in  that  loop.  This  statement  comes  from  the  fact 
that  Equation  (5.5)  assumed  that  there  exists  a unitary  transformation 
such  that 

UH  U - A (5.25) 

and  A is  a diagonal  matrix  with  the  decoupled  eigenvalues  on  the 
diagonal.  The  initial  rate  of  change  is: 

Q1  " Q0  " “ wA  Q0  * <5.26) 

where  — r~  can  be  Interpreted  as  the  time  constant  of  each  loop.  This 

y*i 

relation  has  been  observed  in  Wldrow's  LMS  algorithm.  Since  the  gain 
used  in  the  stochastic  approximation  algorithm  is  time  variable,  it  is 
difficult  to  assess  speed  of  response  in  exactly  the  same  sense  as  the 
constant  gain  case. 

o 


He  can  observe  that  the  underlying  relaxation  phenomenon  which 
takes  place  in  the  weight  values  (filter  coefficients)  for  the  idealized 
case  is  of  exponential  nature  and,  since  the  mean  square  error  is  a 
quadratic  form  in  the  weight  values,  the  transients  in  the  mean-square 
error  function  must  also  be  exponential  in  nature.  The  transients 
consist  of  sums  of  exponentials  with  time  constants  given  by: 

Tp  ' • 1 * l-  2 n • <5-27) 

where  is  the  ith  eigenvalue  of  the  input  signal  correlation  matrix. 

It  has  been  shown  by  Widrow  (133)  that  the  misadjustment  of  the  basic 
adaptive  element  using  the  pilot  vector  algorithm  is  given  by 

M - i ^ . (5.28) 

P-1  P 

It  can  be  seen  that  the  misadjustment  of  the  deterministic 
gradient  algorithms  is  proportional  to  the  time  constants  of  the  filter 
weight  adjustment  procedure.  It  has  been  observed  experimentally  that 

the  stochastic  approximation  algorithm  performs  initially  in  a similar 

* 

manner.  As  time  goes  on,  the  stochastic  approximation  algorithm  has 
much  leas  misadjustment  than  is  possible  with  the  deterministic  gradient 
algorithms  because  of  the  smoothing  properties  of  the  gain  sequence. 

It  will  be  seen  in  Chapter  VI,  in  the  plots  of  S/N  for  different 
starting  u's  , that  the  speed  of  convergence  and  the  misadjustment  of 
the  algorithm  are  directly  related  to  the  size  of  the  starting  m's  . 

The  smaller  the  starting  value,  the  slower  the  algorithm  converges,  but 


there  is  less  noise  in  the  algorithm  and  the  S/N  is  a smooth  curve.  The 
price  one  pays  for  this  less  noisy  system  is  slow  convergence.  The 
greatest  benefit  that  accrues  to  the  system  designer  from  the  use  of  the 
special  gain  sequence  is  the  fast  Initial  Increase  in  S/N  without  the 
drawback  of  noisy  matrix  filter  coefficients. 

It  is  difficult  to  compare  the  constant  y case  with  the  stochas- 
tic approximation  algorithm  because  the  stochastic  algorithm  is 
essentially  a time  varying  feedback  system  and  the  constant  y case  is 
a constant  feedback  system.  Initially,  the  two  are  comparable  if  the 
same  starting  gain  is  used  but  the  time  varying  gain  possesses  the 
ability  to  smooth  out  the  mlsadjustment  of  the  filter  weights  due  to 
its  statistical  properties. 

In  the  deterministic  gradient  algorithms,  it  is  impossible  to 
obtain  fast  convergence  and  small  mlsadjustment.  The  stochastic 
approximation  technique  allows  fast  Initial  convergence  with  large 
mlsadjustment,  but  as  the  number  of  iterations  becomes  large,  the 
mlsadjustment  (variance  of  the  filter  coefficients)  will  automatically 
go  to  zero.  This  variance  reduction  is  a very  valuable  result  for 
systems  which  can  allow  long  running  times  for  the  recursive  algorithm. 

5.2  Stochastic  Stability  Considerations 

At  this  point,  it  has  been  proven  that  if  the  recursive  algorithm 
satisfied  the  convergence  conditions  Imposed  by  stochastic  approximation, 
then  it  converged  with  probability  one  to  the  optimum  filter.  It  was 
noted  in  the  proof  that  the  sequence  of  matrix  filter  coefficients  is  a 
martingale.  The  fact  that  the  sequence  of  filters  is  a martingale  is 
not  only  useful  in  proving  convergence  in  the  sense  of  stochastic 
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Approximation,  but  also  In  showing  that  the  stochastic  adaptive  filter 
algorithm  Is  stable  In  the  sense  of  stochastic  stability  of  a control 
system.  Appendix  C contains  some  useful  definitions  concerning  martin- 
gales. 

There  are  many  examples  of  the  use  of  martingales  in  the 
engineering  literature.  It  Is  well  known  that  the  Wiener  process  is  a 
martingale  (McGarty,  83),  certain  Integrals  used  in  stochastic  differ- 
ential equations  are  martingales  (McShane,  85),  and  the  likelihood 
ratios  used  in  statistical  detection  theory  (McGarty,  83)  are 
martingales. 

It  is  well  known  that  the  stability  of  deterministic  dynamical 
systems  can  be  proven  if  we  can  construct  Lyapunov  functions  that 
satisfy  certain  conditions  (Sage,  108).  Using  Ogata  (91),  we  can  define 
a Lyapunov  function.  Let  us  suppose  there  exists  a scalar  function 
V(ic)  continuous  in  x , such  that: 

V(x)  >0  for  x > £ (5.29) 

and 

& V(x)  < 0 for  * i ® » (5.30) 

where  A V(x)  - VQc^)  - • 

V(0)  - 0 (5.31) 

and 

V(x)  * as  11*11  • • 


(5.32) 
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Then,  Che  equilibrium  state  x • JO  is  asymptotically  stable  in  the 
large  and  V(x)  is  a Lyapunov  function.  These  conditions  are 
equivalent  to  saying  that  there  exists  a positive  definite  continuous 
function  of  x,  V(x)  , such  that  its  time  derivative  along  a system 
trajectory  is  nonpositive  definite.  If  the  Lyapunov  function,  V(x)  , 
satisfies 


V(x^)  > VQc^)  > . . . > ) (5.33) 

for  the  discrete  system  {x  , n - 0,  1,  . . .i)  , then  the  stability 

n 

of  the  origin  is  implied  (Aokl,  8). 

The  following  analysis  of  stochastic  stability  uses  analysis  and 
results  contained  in  Aokl  (8).  Using  these  results,  we  can  establish 
the  stability  conditions  for  a dynamic  system. 

If  we  construct  the  state  variable  model  for  the  recursive 
adaptive  filter  algorithm  (Figure  9),  then  it  is  possible  to  construct 
a stochastic  Lyapunov  function,  V(h)  , similar  to  that  used  in  deter- 
ministic systems.  Consider  the  discrete  time  stochastic  dynamical 

model  of  the  adaptive  filter  described  by  (for  the  vector  case): 

% 

Vi  • <5-M> 

where  h^  is  the  n-dlmenslonal  state  vector  (filter  coefficients), 
is  the  m-dlmenslonal  control  vector  (gain  sequence) , and  x^  and  y^ 
ere  the  random  vectors  of  Inputs  and  outputs. 
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For  a f$.ven  control  policy,  a collection  of  state  trajectories 
is  possible  for  a given  Initial  state  h^,  depending  on  the  realizations 
of  the  Input  stochastic  processes.  This  result  differs  from  that  of  a 
deterministic  system  where,  for  a given  control  policy,  only  one  unique 
state  trajectory  results.  Due  to  the  stochastic  nature  of  the  Input 
processes  and  the  resulting  adaptive  filter,  the  behavior  of  the 
Lyapunov  functions  must  be  considered  in  probabilistic  terms.  It  is 
thus  Indicated  to  replace  Equation  (5.33)  on  deterministic  systems  by 
the  following  one  for  stochastic  systems  (Aoki,  8) : 

BtVQtj)}  > E{V(x2)}  > . . . > EUV^)]}  . (5.35) 

This  condition  has  exactly  the  same  intuitive  meaning  as  in  the  deter- 
ministic case  except  that  now  the  definition  is  in  terms  of  expected 
values. 

The  Idea  that  stochastic  Lyapunov  functions  are  martingales  was 
recognized  by  Bucy  (21)  and  Kushner  (73)  and  use  is  made  of  this  fact 
In  proving  stochastic  stability.  Bucy  arrived  at  the  relationship 
described  by  Equation  (5.35)  by  noting  that  the  Lyapunov  function 
satisfies  a martingale  property.  Bucy  noted  that  for  any  realization 
of  the  stochastic  process  {x^}  , 

^(x^  | J^,  *x  • • •*  5b_1}  - V(5n-1)  ' (5,36) 

If  we  use  the  equality 

\ <Ev|x  IVQ^)  | Xq,  xx  . . . , (5.37) 


A. 
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then  we  can  conclude  that  the  expected  value  of  V(x  ) is  not  greater 

n 

than  the  last  value  V(x  . It  is  precisely  the  property  in  Equation 
(5.36)  which  allows  us  to  write  Equation  (5.35)  and  conclude  that,  for 
a stable  dynamical  system,  the  Lyapunov  function  must  be  a super- 
martingale  (Appendix  C). 

By  direct  analogy  with  deterministic  systems,  this  behavior  of 
stochastic  Lyapunov  functions  means  that  the  conditional  expected 
generalized  energy  is  not  increasing  with  time  and  the  system  can  be 
regarded  as  stable  in  the  stochastic  sense. 

Thus,  by  proving  that  the  stochastic  process  representing  the 
filter  coefficients  is  a martingale  not  only  allows  us  to  prove  conver- 
gence in  the  sense  of  stochastic  approximation,  but  also  allows  us  to 
show  that  the  stochastic  process  of  filter  coefficients  is  stable  in 
the  sense  of  a stochastic  control  system. 

5.3  Unification  of  Previous  Adaptive  Systems  Using  Stochastic 

Approximation 

In  this  section,  it  is  shown  that  the  LMS  type  deterministic 
algorithms  can  be  put  in  the  form  of  stochastic  approximation.  Other 
algorithms,  beside  the  LMS  algorithms,  can  be  unified  into  the  frame- 
work of  stochastic  approximation  but  the  unification  method  will  only 
be  done  for  the  LMS  type  algorithms.  In  fact,  all  algorithms  that 
result  from  the  minimization  of  the  mean  square  error  performance 
measure  can  be  shown  to  fit  the  framework  of  stochastic  approximation 
if  certain  conditions  on  the  gain  sequence,  , and  the  regression 
function  are  satisfied.  The  orthogonal  projection  lemma,  as  it  was  in 
the  derivation  of  the  stochastic  approximation  algorithm,  is  used  to 


108 


derive  the  regression  function  for  use  In  the  recursive  algorithm  which 
determines  the  filter  weights.  Since  these  LMS  type  deterministic 
algorithms  do  not  satisfy  the  convergence  conditions  of  stochastic 
approximation,  only  a weaker  convergence  can  be  proved  for  them.  This 
lack  of  convergence  is  an  extremely  Important  weakness  of  these  adaptive 
techniques  because  these  algorithms  work  with  stochastic  processes  and 
strong  statistical  convergence  properties  are  necessary  if  the  recursive 
algorithms  are  to  be  applied  In  a wide  variety  of  situations. 

The  Uidrow  adaptive  filter  algorithm  is: 


Vl  ’ K + uI,in  - ^ ’ <5-38) 

where  h^  is  the  weight  vector,  dR  is  the  pilot  signal,  y^  is  the 

output  of  the  filter  and  is  a vector  of  signals  composed  of 

various  delayed  versions  of  x , the  output  of  the  sensors. 

— n 

The  Wldrov  algorithm  can  now  be  put  into  the  framework  of  stochas- 
tic approximation.  It  is  desired  to  obtain  the  tapped  delay  line  filter 
weights  which  make  the  output,  y^  , the  minimum  mean  squared  error 
(MMSE)  estimate  of  dR  , a pilot  signal.  The  direction  of  arrival  and 
the  statistics  of  the  pilot  signal  are  assumed  known.  If  we  want  to 
find  a filter  where 


E{[d  - y 1^}  “ minimum 

n n 


(5.39) 


then  the  orthogonal  projection  lenua  says  this  objective  is  achieved 
when  the  error  is  orthogonal  to  the  signal 
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E{x  (d  - y )}  ■ 0 . (5.40) 

— n n n 

An  identity  from  the  theory  of  conditional  expectation  (Karlin 
and  Taylor,  66),  E{g(x)  f(y)}  - E{E[g(x)|y]  f(y))  , allows  us  to 

write  Equation  (5.40)  as: 

vw*«.  - * 0 • <5,41) 


This  equation  is  satisfied  for 

Vh{*nIdn  ' W " 0 ' (5>' 

Using  Equation  (5.42)  as  a regression  function  we  can  write  a 
recursive  algorithm  of  the  form 


h . . 
— n+1 


- h + y r 


(5.43) 


with  the  regression  function 

% 

r(h)  - E(r  /h  } - E{x  Id  - y ] /h  } . (5.44) 

— — — n — n n n “H 

Equation  (5.43)  and  Equation  (5.44)  constitute  Widrow's  weight 
adjustment  algorithm.  Since  this  algorithm,  as  Implemented  by  Widrow, 
does  not  satisfy  the  convergence  conditions  of  stochastic  approximation, 
any  convergence  must  be  in  a statistically  weaker  sense  than  that  for 
stochastic  approximation. 
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To  derive  Griffith's  algorithm,  it  is  necessary  to  define  the 

input  vector  x as  a sum  of  signal  vector  a and  noise  vector  n 
— n — n — n 

We  also  assume  that  the  signal  to  be  estimated  d^  is  uncor- 
related with  every  signal  in  the  noise  vector  n , i.e.,  E{d  n*}  * 0 . 

~n  n ~ n 

With  these  assumptions,  the  regression  function  can  be  derived 
using  the  preceding  analysis. 

The  regression  function  becomes 


Eir  /h  } 
— n — n 


M - y ] X /h  } 

n n n 


(5.45) 


Griffith's  algorithm  has  the  same  drawback  as  Widrow's  algorithm. 
Even  though  the  regression  function  represents  a stochastic  process  and 
the  resultant  sequence  of  weight  vectors,  the  use  of  a constant  y is 
suggested  by  Griffiths.  Obviously,  this  algorithm  does  not  satisfy  the 
convergence  conditions  of  stochastic  approximation  and  thus  only  a 
convergence  of  the  mean  of  the  weights  can  be  proven.  It  is  to  be  noted, 
that  while  convergence  of  the  mean  of  the  filter  weights  is  valid,  large 
errors  or  misadjustments  can  occur  for  different  realizations  of  the 
stochastic  process  that  describe  the  filter  weights.  The  convergence 
results  for  Griffith's  algorithm  are  weaker  than  convergence  in  mean 
square  or  convergence  with  probability  one.  The  weakness  of  this  form 
of  convergence  is  intimately  related  to  the  usefulness  of  the  algorithm. 
In  many  applications  with  this  type  of  convergence,  the  algorithm  is, 
in  general,  not  useful. 

All  algorithms  that  result  from  the  minimization  of  mean  square 
error  can  be  shown  to  fit  the  framework  of  stochastic  approximation. 


However,  the  application  of  adaptive  techniques  to  stochastic  processes 
without  the  benefit  of  stochastic  approximation  techniques  results  in  a 
reduced  statistical  performance  of  the  adaptive  processor. 
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CHAPTER  VI 


COMPUTER  SIMULATION  RESULTS 

6.0  Introduction 

This  chapter  contains  a description  of  the  three  beam  system  used 
to  test  the  stochastic  adaptive  filter  and  a discussion  of  the  different 
tests  applied  to  the  algorithm.  A discussion  of  the  results  of  the 
computer  simulation  tests  for  normal  operation  is  given  for  a sample 
from  each  of  the  generic  cases. 

The  use  of  implicit  constraints,  derived  from  physical  consider- 
ations, is  shown  to  give  the  same  results  as  the  cases  without 
constraints.  The  only  difference  is  in  the  minimum  allowable  mean 
square  error.  It  is  shown  that  both  the  constrained  and  unconstrained 
modes  give  the  same  result  for  the  short  term  operation  of  the  algorithm. 
The  physical  and  mathematical  interpretations  of  the  implicit  constraints 
are  given  and  a justification  for  their  use  is  established. 

The  speed  of  convergence  and  the  noisiness  of  the  recursive 
algorithm  are  shown  to  depend  on  both  the  special  gain  constant  and 

the  infinite  sequence  1/n1  . Changing  these  two  parameters  gives  a 
spectrum  of  results.  The  results  show  that  the  special  gain  sequence 
gives  a fast  initial  increase  in  output  S/N  and  thus  allows  the 
algorithm  to  operate  in  an  on-line  environment.  The  ease  with  which 
the  parameters  needed  in  the  adaptive  algorithm,  in  a practical  case, 
can  be  obtained  is  described.  A fast  eigenvalue  and  eigenvector 
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calculation  is  described  in  this  chapter.  The  analytical  method  of 
eigenvalue  calculation  is  called  the  QL  algorithm  (Wilkinson,  135). 

This  method  of  eigenvalue  determination,  basically  a series  of 
similarity  transformations,  is  fast  enough  for  on-line  use. 

A stopping  rule  is  proposed  for  use  in  future  work.  The  stopping 
rule  uses  an  averaged  Frobenius  norm  to  determine  accuracy  in  the 
probabilistic  sense.  From  a theoretical  point  of  view,  stopping  rules 
are  necessary  to  determine  the  long  term  accuracy  of  the  technique.  No 
general  stopping  rule  has  been  devised  for  stochastic  approximation 
algorithms.  The  matrix  filter  is  only  calculated  for  one  frequency 
since  the  calculation  is  the  same  for  each  frequency. 

6.1  Beam  Construction  and  Signal  Generation 

A three-beam  system  has  been  simulated  to  be  used  to  test  the 
adaptive  filter  derived  from  the  stochastic  approximation  algorithm. 

In  this  simulation,  s^  will  represent  the  signal  in  Beam  1,  and  all 
other  signals  will  be  considered  interfering  noises.  The  same  is  true 
of  Beams  2 and  3 where  82  and  , respectively,  will  represent  the 
signals  in  their  beams  and  all  other  signals  will  be  considered 
interfering  noises. 

The  beams  have  been  constructed  in  the  following  manner: 

Beam  1 - - (sl^  + (a)s2  + (b)s3  + (cc)sn1  , (6.1) 

where  (81)8^  «■  the  signal  in  the  main  lobe  one.  si  is  a 

constant  representing  the  angular  position 
of  signal  s^  in  Beam  1. 


I 


r 


o 
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(a)s2 


(b)s3 


(cc)sn^ 


» a signal  s2  mainly  In  Beam  2,  entering 
the  side  lobes  of  the  main  beam  that 
contains  s^ , a Is  a constant  representing 
Its  position  In  Beam  1. 

- same  form  as  (a)s2  but  with  signal  s^ 
mainly  in  Beam  3 and  constant  b repre- 
senting its  position  in  Beam  1. 

■ any  self  noise  contained  in  the  same  lobe 
as  the  main  signal  s^  . 


Beam  2 - x2  * (f^  + (s2)s2  + (g)s3  + (dd)sn2 
Beam  3 * x3  - (h)s1  + (p)s2  + (s3)s3  + (ee)sn3 


(6.2) 

(6.3) 


Figure  4 shows  the  physical  description  of  signal  spatial 
position  relative  to  other  signals  as  a function  of  the  beam  pattern. 

The  beams  have  been  constructed  in  this  manner  so  that  an 
analytical  expression  could  be  obtained  for  both  the  slgnal-to-nolse 
ratio  and  mean  square  error  at  the  output  of  the  matrix  adaptive  filter. 
If  the  beams  are  not  constructed  in  this  manner,  then  a tractable 
mathematical  expression  for  either  performance  measure  would  not  be 
possible. 

Knowing  the  input  statistics  (mean,  variance),  it  is  possible  to 
calculate  an  estimate  of  the  signal-to-nolse  ratio  at  the  output  of  the 
filter  and  compare  the  estimated  theoretical  value  to  the  value  obtained 
when  using  the  matrix  filter.  We  use  signal-to-noise  ratio  because  it 
is  a good  measure  of  detection  performance.  If  we  call  the  system 
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output  £(f)  , then  we  can  define  the  signal-to-nolse  ratio  (S/N) to  be 
the  ratio  of  variance  of  signal  power  to  the  sum  of  the  variances  of  the 
noise  power,  i.e., 

HT(f)G(f)H*(f) 

S/N  - — 22 5 . (6.4) 

n (f )G  (f)H  (f) 
nn 

Xf  we  assume  an  output  of  the  form 

*(f)  - HT(f)  x(f)  (6.5) 

and  if  we  construct  the  antenna  pattern  outputs  in  the  preceding  form, 
then  we  can  use  the  above  definition  of  signal-to-noise  ratio  to  observe, 
on  an  iteration  to  iteration  basis  of  the  adaptive  weight  calculation, 
how  the  S/N  is  changed  at  the  output  of  the  adaptive  filter.  The  deri- 
vation of  the  signal-to-nolse  ratio  for  each  beam  appears  in  Appendix  A. 

The  following  generic  cases  have  been  simulated  and  the  para- 
meters used  in  the  simulation  appear  in  Tables  I and  II. 

Case  1.  Beam  1 ■ x^  ■ (sDs^  + (a)s2  + (b)s3  + (cOsnj 

Beam  2 - x2  - (f)sx  + (s2)s2  + (g)s3  + (dd)sn2 

Beam  3 - x^  ■ (h)s1  + (p)a2  + (s3)s3  + (ee)sn3  . 

This  case  corresponds  to  the  physical  situation  where  there  is 
coupling  of  signals  into  all  beams. 
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Case  2.  Beam  1 ■ x^  • (sl)s^  + (a)s2  + (b)s^  + (cc)sn2 

Beam  2 - *2  ■ (s2)s2  + (dd)sn2 

Beam  3 « x_  ■ (p)s2  + (s3)s3  + (eejsn^  . 

This  beam  configuration  corresponds  to  coupling  of  signal  two 
(s2)  and  signal  three  (s^)  Into  Beam  1,  and  signal  (s2>  into  Beam  3. 

Case  3.  Beam  1 • ■ (sl)s^  + (a)s2  + (cc)sn^ 

Beam  2 - x2  - (s2)s2  + (dd)sn2 

Beam  3 - x^  - (p)s2  + (s3)s3  + (ee)sn3  . 

This  beam  pattern  corresponds  to  a coupling  of  signal  two  (s2) 
into  both  Beam  1 and  Beam  3 and  no  other  coupling. 

It  la  Important  to  note  that  the  self  noise  (sn^  to  sn^) 
contained  In  each  beam  only  limits  the  maximum  signal-to-nolse  ratio 
obtainable  and  the  effectiveness  of  the  algorithm  can  best  be  measured 
by  comparing  percent  Improved  or  percent  of  theoretical  maximum  attained 
rather  than  absolute  levels. 

All  signals  used  in  these  simulations  are  assumed  to  have  Gaussian 
distributions  with  zero  mean  and  adjustable  variances.  The  variable 
variance  allows  one  to  vary  the  power  contained  in  any  signal. 
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6.2  Algorithm  Parameter  Determination 
A gain  sequence  of  the  form 


S 

M ■ — 7 0.5  < 1 < 1.0  (6.6) 

n 1 — 

n 


has  been  used  In  the  experiments.  This  gain  sequence  was  derived  from 
both  speed  and  stochastic  stability  considerations.  It  not  only  gives 
fast  Initial  Improvement  and  has  built-in  statistical  smoothing 
properties,  but  it  is  easy  to  determine  the  parameters  necessary  for  its 
use  In  the  stochastic  adaptive  algorithm  which  determines  the  matrix 
filter. 

From  the  analysis  of  the  idealized  algorithm,  a gain , , can  be 

determined  which  lies  on  the  stability  boundary  of  the  dynamical  control 
system  representation  (see  Figure  8)  of  the  stochastic  recursive 
algorithm. 

This  gain,  S , has  been  determined  from  the  convergence 
c 

conditions  of  the  idealized  algorithm  to  be: 


S 

c 


2.0 

X 


max 


» 


(6.7) 


where  X is  the  largest  eigenvalue  of  the  input  power  spectral 
max 

density  matrix  G (f)  . 

xx 

The  power  in  a beam  is  a good  approximation  to  the  desired 
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eigenvalues  and  a circuit  that  measures  the  power  in  a beam  can  be 
easily  implemented.  It  is  also  possible  to  bound  the  maximum  eigenvalue 
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of  the  input  power  spectral  density  matrix.  If  we  define  the  trace  of 
a matrix  to  be  the  sum  of  the  diagonal  terms,  then  the  bound  can  be 
written  as 

* < trace[G  (f)]  . (6.8) 

In  the  experiments  that  were  performed,  the  eigenvalues  were 
obtained  both  from  the  power  measurement  method  and  from  the  more  accu- 
rate analytical  method,  the  QL  algorithm  (Wilkinson,  134).  The  QL 
algorithm  finds  the  eigenvalues  and  eigenvectors  by  three  stage  pro- 
cesses consisting  of  finding  a series  of  similarity  transformations  (QL 
matrices),  which  reduce  the  original  matrix  to  a diagonal  matrix  with  the 
eigenvalues  along  the  diagonal.  The  eigenvalue  algorithm  is  based  on 
three  simple  principles  from  linear  algebra  (Hohn,  38).  These  principles 
are: 

1.  Any  Hermitian  matrix  may  be  reduced  to  tridiagonal 
(Hessenberg  generally)  form  by  elementary  similarity 
transformations. 

2.  Any  symmetric  matrix  A can  be  diagonalized  by  an 
orthogonal  similarity  transformation  and  the  matrix 
which  is  used  to  diagonalize  A has  as  its  columns 
an  orthonormal  set  of  eigenvectors  for  A. 

3.  Similar  matrices  have  the  same  eigenvalues  (charac- 
teristic roots)  but  different  eigenvectors  which 
are  related  by  the  matrix  of  transformations. 

The  special  purpose  eigenvalue  and  eigenvector  routine  consists 
of  three  parts.  The  first  part  is  the  transformation  of  the  original 
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complex  Hermltlan  matrix  to  tridiagonal  form.  The  second  part  of  the 
method  is  the  appllcaton  of  the  implicit  QL  algorithm  to  the  resultant 
tridiagonal  matrix  to  obtain  its  eigenvalues  and  eigenvectors.  Since 
the  eigenvalues  of  the  original  complex  Hermitian  matrix  are  preserved 
under  the  similarity  transformations  used  in  the  algorithm,  it  only 
remains  to  back  transform  the  eigenvectors  of  part  two  to  the  eigen- 
vectors of  the  original  complex  Hermitian  matrix  using  the  similarity 
transformations  of  part  one. 

If  we  consider  a unit  of  time  as  the  time  required  to  do  one 

multiplication  and  one  addition,  and  N is  the  order  of  the  original 

complex  Hermitian  matrix,  then  the  total  time  needed  to  obtain  all  of  the 

2 

eigenvalues  and  eigenvectors  is  4/3N  units.  For  nultlplication  times 
on  the  order  of  300  nanoseconds,  this  method  of  calculating  the  eigen- 
values and  eigenvectors  can  be  considered  a real  time  performer. 

One  could  employ  many  different  strategies  to  both  accelerate 
the  search  and  obtain  filter  weights  closer  to  the  optimum.  No  effort 
has  been  made,  however,  to  optimize  the  search ‘strategy.  The  motive 
behind  the  adaptive  filter  was  to  develop  a method  which  gave  the 
greatest  improvement  in  S/N  in  the  fewest  number  of  iterations.  These 
attributes  make  the  system  applicable  for  on  line  use  in  an  adaptive 
processor.  It  is  theoretically  possible  to  develop  a search  strategy 
for  the  recursive  algorithm  whereby  one  approaches  the  optimum  within  a 
small  error.  To  do  this  would  require  many  more  computations  at  each 
Iteration  and  the  speed  of  the  algorithm  would  be  slowed  considerably. 

All  methods  to  accelerate  convergence  require  taking  more  observations 
at  a given  iteration  of  the  algorithm  and  this  clearly  slows  down  the 
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speed  of  Che  estimation  process.  IC  can  be  seen  from  the  experimental 
results  that  both  a modification  of  the  special  gain  constant,  , and 
the  Infinite  sequence,  l/n£  , can  accelerate  convergence.  No  effort 
has  been  made  to  accelerate  the  convergence  rate  by  varying  the  exponent 
1 . The  main  reason  for  this  Is  that  one  design  objective  of  this 
adaptive  algorithm  Is  that  the  algorithm  must  be  obtained  In  as  computa- 
tionally simple  a manner  as  possible  within  the  constraint  of  rapid 
enough  convergence  for  real  time  use  of  the  algorithm.  These  design 
objectives  can  be  obtained  with  the  simplest  exponent  which  is  one.  If 
even  faster  convergence  is  desired,  then  a variation  of  the  exponent  is 
required. 

The  recursive  algorithm  used  to  derive  the  matrix  filter  can  be 
2 2 

Implemented  using  2N  complex  multiplications,  N real  multipllca- 
2 

tions  and  2N  complex  additions  (N  is  the  number  of  beams)  per 
Iteration.  With  present  computing  speeds  the  algorithm  can  be 
implemented  in  real  time. 

6.3  Adaptive  Algorithm  Applications 

We  can  write  the  matrix  filter  for  the  three  beam  system  as 


H(f) 


u<f) 

h12<f> 

h13<f) 

21(£> 

h22(f) 

h23(f) 

31<f> 

h32(f) 

h33(f) 

• 

(6.9) 

The  stochastic  approximation  algorithm  will  aimultaneously  derive 
all  the  terms  in  H(f)  and  then  the  filter  is  applied  in  recursive 
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fashion  to  the  three  beam  system  to  see  how  well  the  interfering  noises 
in  each  beam  are  removed  as  the  system  evolves  in  time. 

Two  applications  of  the  decoupling  matrix  are  considered.  The 
physical  implementation  in  control  system  form  of  either  case  of  the 
adaptive  filter  appears  in  Figure  10.  The  first  application  is  when 
the  filter  coefficients  along  the  diagonal  are  not  constrained  to  be  one, 
and  the  second  application  is  when  the  diagonal  filter  coefficients  are 
constrained  to  be  one.  If  the  experiment  is  set  up  in  a physically 
correct  manner,  then  it  makes  no  difference  in  the  performance  of  the 
algorithm,  except  for  differences  in  attainable  mean  square  error, 
whether  the  diagonal  terms  are  constrained  or  not.  Sample  cases  have 
been  run  for  each  case  and  the  experimental  verification  of  the  above 
statements  are  discussed.  One  of  the  advantages  of  operating  in  the 
constraint  mode  is  that  there  is  an  upper  bound  on  the  mean  square  error 
or  a lower  bound  on  the  signal-to-noise  ratio.  While  this  might  be 
considered  an  advantage  for  the  long  term  operation  of  the  algorithm, 
it  has  little  effect  on  the  time  frame  of  interest  considered  here. 

Constraining  the  diagonal  terms  of  the  matrix  decoupling  filter 
to  have  a gain  of  one  is  what  is  commonly  referred  to  in  adaptive  array 
processing  (Angerson,  5;  Applebaum,  10;  Claerbout,  25)  as  maintaining 
the  main  beam.  This  is  done  in  practice  to  prevent  boresight  signal 
cancellation.  To  prevent  main  lobe  signal  cancellation,  the  artifice 
of  a pilot  or  reference  steering  signal  is  used.  The  pilot  signal 
inhibits  the  adaptive  processor  from  responding  to  signals  with  pre- 
sci -bed  directional  characteristics.  This  technique  for  maintaining 
the  main  lobe  is  the  equivalent  of  incorporation  of  a spatial  filter  to 
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\ • DENOTES  EVALUATED  AT  TIME  n 

Figure  10.  Implementation  of  Adaptive  Filter  Decoupling  Network 
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restrict  the  gain  of  the  adaptive  processor  in  the  boresight  direction, 
and  is  well  known  in  practice.  By  making  the  gain  one  along  the  diagonal, 
it  insures  that  no  more  signal  than  is  already  in  that  beam  is  trans- 
mitted through  the  filter.  A perusal  of  Figure  4 illustrates  the 
physical  reasoning  behind  the  mathematical  technique  of  the  diagonal 
constraint. 

This  idea  is  also  equivalent  to  the  mathematical  idea  of  a pro- 
jection operator  (Halraos,  53).  We  are  forcing  (constraining)  certain 
filter  values  to  be  on  a certain  part  of  the  minimization  surface 
corresponding  to  the  rules  of  the  constraint.  The  idea  of  projection 
operator  is  well  known  in  linear  algebra  (Stoll,  120).  The  method  used 
here  produces  exactly  the  same  result.  In  essence,  it  limits  certain 
matrix  filter  coefficients  by  projecting  a particular  filter  coefficient 
to  have  a certain  value  corresponding  to  the  constraint  boundary  and  no 
other.  Many  authors  have  used  the  idea  of  a projection  operator. 

Rosen  (103)  used  the  idea  in  basic  gradient  search  techniques.  However, 
he  had  no  means  to  keep  the  projection  exactly  on  the  constraint 
boundary.  Frost  (47)  has  applied  the  same  idea  to  array  processing. 

With  the  projection  operator  used  by  Frost,  one  can  guarantee  that  the 
projection  will  always  be  along  a boundary.  The  technique  used  here  is 
equivalent,  in  operation,  to  Frost's  projection  operator. 

Mendel  and  Fu's  (86)  first  order  method  of  projection  operator 
forces  the  adaptive  algorithm  to  use  values  of  the  estimated  variable 
only  if  the  variables  are  within  a cube.  They  know  from  physical  con- 
siderations the  limits  on  some  of  the  random  variables  to  be  estimated. 
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For  the  case  of  the  beam  patterns  considered  here,  the  use  of 
this  technique  is  indicated  from  the  consideration  of  the  weak  signal 
strong  interference  assumption.  For,  if  the  experiment  is  constructed 
in  a physically  realizable  manner,  then  there  cannot  possibly  be  a 
situation  where  a signal  has  more  power  in  the  beam  where  it  is  con- 
sidered a noise  than  in  the  beam  where  it  is  considered  a signal. 

Since  it  is  unreasonable  to  allow  the  filter  to  feed  more  signal  into  a 
beam  than  is  already  there,  the  technique  of  constraining  the  filter 
estimate,  by  use  of  the  projection  operator  technique,  is  dictated  by 
physical  consicerations. 

In  stochastic  approximation,  sufficient  conditions  for  mean 
square  and  probability-one  convergence  are  satisfied  within  some  unknown 
bounded  convex  set.  If  a convergence  region  were  known,  a reflecting 
barrier  at  the  boundary  would  solve  the  global  convergence  conditions 
and  the  estimate  would  converge  in  mean  square  and  with  probability  one. 
Davisson  (33)  has  proven  that  if  we  let  A be  the  event  where  the 
estimate  sequence  remains  within  some  given  convergence  region,  then 
convergence  conditioned  on  A occurs  in  mean  square  and  with  probability 
one  because  the  sequence  of  estimates  is  the  same  as  if  a reflecting 
barrier  were  placed  at  the  boundary.  Davisson  showed  that  the 
unconditional  probability  of  convergence  is  bounded  below  by  the 
probability  of  the  event  A.  The  point  is  that  with  a reflecting 
barrier, convergence  is  guaranteed.  Without  a reflecting  barrier, 
global  conditions  can  be  bounded  by  the  probability  of  event  A.  When 
there  are  equality  constraints  imposed  on  the  solution,  the  reflecting 
barriers  can  be  considered,  by  analogy  with  Markov  processes,  absorbing 
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barriers  and  the  absorbing  barriers  then  give  the  same  result  as  the 
projection  operators.  The  mathematical  meaning  of  an  absorbing 
barrier  can  be  given  by  the  following  definition. 

Definition  6.1 

A state  in  a Markov  chain  is  said  to  be  absorbing  if  P(j,i)  = 0 
for  all  states  i / j . In  other  words  a state  j is  absorbing  if 
P(j,j)  ■ 1 . One  cannot  leave  an  absorbing  state.  The  probability, 
P(i,j)  , is  defined  as  the  transition  probability  of  being  in  state  i 
and  moving  to  state  j . 

For  physically  realizable  cases,  the  technique  of  constraining 
the  diagonal  terms  of  the  matrix  adaptive  filter  is  analogous  to  both 
the  projection  operator  technique  and  the  reflecting  barrier.  It  is 
similar  to  the  projection  operator  in  that  the  linear  transformation 
necessary  to  convert  the  old  matrix  filter  to  the  new  filter  (satisfying 
the  constraint)  can  be  determined  from  the  solution  of  a set  of  simul- 
taneous linear  equations.  If  we  assume  that  the  matrix  A is  the 
desired  projection  operator,  then  we  can  write: 

A H^OLD  " H^NEW  * (6.10) 

This  set  of  equations  guarantees  that  the  diagonal  terms  of  the  new 
matrix  filter  are  constrained  to  be  one.  The  constraint  technique  uses 
the  idea  of  an  absorbing  barrier  in  that  it  uses  a priori  knowledge  of 
the  solution  to  constrain  certain  matrix  filter  coefficients  to  have 
certain  values. 


I 


( 


127 


6.3.1  Effects  of  Special  Gain  Sequence  Recall  that  the  generic 
form  of  the  special  gain  sequence  is 


p - (S  ) -r  0.5  < 1 < 1.0  (6.11) 

n c i 

n 


where  S has  been  shown  in  Section  5.1.1  to  be  equal  to  2/A 

c max 

To  show  the  effect  of  this  gain  sequence  on  the  convergence  rate  of  the 
recursive  algorithm,  there  are  two  tests  conducted.  The  exponent  1 
Is  varied  over  its  range  and  the  proportion  of  the  gain  constant 
used  in  the  algorithm  is  varied. 

It  can  be  seen  from  Figure  11  that,  as  the  exponent  i is  varied, 
the  rate  of  convergence  goes  up.  From  the  mathematical  considerations 
of  the  infinite  sequence 


(6.12) 


it  can  be  observed  that  as  1 increases  from  0.51  to  1.0,  the  corre- 
sponding terms  in  each  series  decrease.  This  means  that  the  effect  on 
the  stochastic  algorithm  by  the  sequence  increases  as  the  exponent  1 
decreases  if  we  consider  succeeding  Iterations  of  the  algorithm.  This 
means  that  as  i decreases,  it  takes  the  gain  sequence  a shorter  time 
to  effect  the  same  Increase  in  output  S/N.  The  price  one  pays  for  this 
acceleration  is  noisy  results.  As  the  exponent  1 is  varied  from  0.6 
to  1.0,  one  can  observe  the  effect  on  the  output  S/N.  The  exponent 
i ■ 0.6  has  the  fastest  increase  but  it  is  so  noisy  that  the  filter 
coefficients  would  not  be  usable  in  a practical  system.  The  case  where 
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1 • 0.8  is  slower  but  the  filter  values  have  a smaller  variance  and 
could  be  usable.  The  case  where  1 ■ 1.0  Is  the  smoothest  of  all  and 
yet  It  Is  fast  enough  for  on-line  use.  With  the  stochastic  approximation 
technique,  It  Is  possible  to  obtain  fast  convergence  and  yet  obtain 
filter  coefficients  with  small  variance.  The  exponent  1 “ 1.0  has 
the  added  advantage  of  computational  simplicity  and  this  advantage  la 
not  to  be  underestimated  In  practical  system  use. 

The  variation  of  the  gain  constant  S also  has  effects  on  the 

c 

stochastic  algorithm.  It  has  been  derived  that  the  optimum  gain  constant 
is 


S 

c 


2.0 

( 

max 


If  this  gain  constant  Is  rewritten  as: 


(6.13) 


(6.14) 


then  G can  be  varied  and  Its  effect  on  convergence  can  be  observed. 
The  results  of  varying  G over  the  range 

0.07  < G < 0.9  (6.15) 


can  be  seen  In  Figure  12. 

It  can  be  seen  that  there  is  a significant  difference  In  the 
values  obtained  for  the  output  S/S  for  the  different  values  of  G . For 
20  iterations,  the  values  of  output  S/N  varied  from  -10  dB  for  G - 0.07 


to  +6  dB  for  G ■ 0.9  . For  G ■ 0.9  , the  algorithm  attains  a S/N 
of  -1-6  dB  in  less  than  ten  iterations.  For  G - 0.7  , it  takes 
approximately  25  iterations.  For  G * 0.1  and  0.07  , the  algorithm 
never  does  reach  + 6 dB  in  the  running  time  of  the  test.  It  can  also 
be  observed  that  the  G ■ 0.9  case  is  noisier  than  any  other  case. 

This  result  can  be  expected  since  the  algorithm  is  operating  close  to 
it 8 stability  limit.  The  case  where  G « 0.5  is  a very  smooth  curve 
and  would  be  desirable  for  practical  use.  However,  it  is  possible  to 
operate  the  algorithm  near  its  stability  limit  and  still  obtain  small 
variance  in  the  filter  coefficients.  For  all  of  the  cases  tested  in 
the  experimental  setup,  a gain  G of  0.7  was  used. 

These  results  are  significant  because  they  show  that  the  algorithm 
must  be  operated  as  near  to  the  maximum  gain  constant  Sc  as  possible 
to  be  able  to  operate  the  algorithm  on-line.  If  a deterministic  algo- 
rithm were  operated  with  values  of  G ■ 0.01  , the  resultant  filter 
coefficients  would  not  be  usable.  The  deterministic  attempts  at 
adaptive  filters  (Griffiths,  51)  have  suggested  that  G be  0.001  or 
less  to  obtain  usable  filter  coefficients.  The  great  power  of  this 
stochastic  approximation  method  is  that  it  allows  the  fastest  Increase 
in  output  S/N  but  yet  its  built-in  statistical  smoothing  qualities  per- 
mit use  of  the  filter  coefficients  even  at  these  high  levels  of  gains. 
This  result  has  not  been  obtained  with  any  deterministic  algorithm. 

A case  where  G was  equal  to  1.5  times  the  maximum  gain  was 
tried  and  it  did  Indeed  become  unstable  as  predicted  by  theory.  No 
plots  were  obtained  because  the  filter  coefficients  became  too  large  to 
plot  after  only  a few  iterations. 


Under  the  weak  signal,  strong  interference  assumption,  the 
adaptive  filter  performs  its  function  exactly  as  predicted  by  theory. 

In  all  the  cases  tested  in  the  simulation,  greater  than  50%  [on  a 
decibel  (dB)  basis]  of  the  strong  interfering  signal  was  removed  in 
less  than  20  iterations  of  the  adaptive  algorithm. 

In  the  case  where  there  is  no  interfering  noise,  the  adaptive 
filter  initially  exhibits  a decrease  in  S/N.  The  amount  of  decrease 
depends  directly  on  the  strength  of  the  signal  in  the  no-noise  beam  and 
whether  the  implicit  constraints  are  applied.  The  reason  for  this  is 
twofold.  The  first  reason  is  that  the  filter  values  are  initialized 
to  an  identity  matrix  which,  for  the  beams  with  no  noise  in  them, 
corresponds  to  the  optimum  filter.  Any  deviation  from  this  point  will 
result  in  a degradation  in  the  performance  of  the  beam  with  no  noise. 
Thus,  while  the  filter  weights  are  adapting  in  other  dimensions,  the 
beam  with  no  noise  suffers.  If  the  first  big  step  takes  the  search 
far  from  the  optimum,  then  if  the  power  is  large,  the  variable  gain 
does  not  have  enough  length,  i.e.,  it  needs  a larger  step  size  to  pro- 
gress rapidly  in  the  no-noise  beam  search  surface.  Theory  guarantees 
eventual  success  but  the  process  can  become  inexorably  slow.  From  the 
theory  of  minimization  or  search  techniques,  this  is  a predictable 
result.  Since  the  variable  gain  sequence  determines  the  convergence 
rate,  the  longer  the  recursive  algorithm  is  run,  the  less  effect  the 
variable  gain  has  on  the  minimization  procedure.  A more  powerful 
search  strategy  can  considerably  enhance  this  process.  This  result  only 
applies  to  the  no-nolse  beam  and  not  to  any  beams  containing  interfering 
noise  because  the  largest  steps  are  made  toward  the  optimum  and  not 
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away  from  the  optimum  In  the  no-nolse  beam.  Figure  13  shows  a case 
where  the  simulation  was  run  to  over  5000  Iterations  of  the  stochastic 
algorithm.  It  can  be  discerned  from  this  plot  that  the  filter  Is 
approaching  the  optimum  with  a small  but  finite  slope.  To  obtain  the 
true  optimum  with  this  particular  search  strategy  was  not  possible  due 
to  limitations  on  computer  time.  Running  the  algorithm,  a large  number 
of  iterations  was  not  desirable  because  the  long  term  results  were  not 
of  primary  interest. 

When  the  power  of  the  signal  in  the  beam  is  equal  to  the  other 
beams  or  the  implicit  constraints  are  used,  then  it  can  be  seen  from 
Figure  14  that  the  S/N  does  not  decrease  as  much  as  when  these  condi- 
tions are  not  fulfilled.  It  can  be  seen  from  Figure  14  that  the 
S/N  only  goes  down  to  about  12  dB  (from  a theoretical  maximum  of  20  dB) 
then  starts  to  increase  almost  immediately. 

It  has  been  shown  from  the  experimental  results  that  the  more 
noise  there  is  in  adjacent  beams,  the  longer  it  takes  the  beam  with  no 
noise  to  start  increasing  its  S/N.  This  fact  can  also  be  gleaned  from 
intuitive  considerations.  The  filter  is  derived  so  as  to  reduce  the 
effect  of  interfering  noises,  and  the  filter  performs  this  function  be- 
fore it  attacks  any  other  inaccuracies.  Thus,  when  there  are  large 
interferences  in  a given  beam,  the  filter  weights  must  adjust  to  remove 
both  large  interferences  and  large  power,  and  it  is  slow  in  attacking 
the  inaccuracies  of  the  no-noise  beam  because  the  mean  square  error  is 
smaller,  on  a percentage  basis,  in  the  no-noise  beam  than  in  the  beam 
with  large  interferences.  When  there  are  few  interferences  and  the 
power  is  not  large,  the  filter  weights  do  not  have  to  adapt  greatly  to 
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remove  the  interfering  noises,  and  thu8  the  filter  weights  can  adjust 
out  the  interfering  noises  and  then  work  on  the  beams  where  there  are 
no  interfering  noises.  The  more  power  and  the  greater  the  number  of 
interfering  noises,  the  longer  it  takes  the  beam  with  no  noise  to 
approach  its  own  theoretical  minimum  mean  square  error  or  maximum 
signal-to-noise  ratio. 

Singer  and  Frost  (115)  have  derived  an  expression  which  bounds 
the  steady  state  error  covariance  of  the  Wiener  filter.  The  relation 
shows  that  the  greater  the  interfering  noise  intensities,  the  greater 
the  steady  state  filtering  error.  This  means  that  with  high  interfering 
noise  powers,  a small  error  in  filter  coefficients  results  in  a large 
Increase  in  mean  square  error.  This  result  is  deceiving  because  if  all 
the  signals  were  normalized,  this  effect  would  not  appear  and  a small 
error  in  filter  weights  would  result  in  a small  mean  square  error.  This 
attribute  of  the  filter  was  also  observed  in  the  experimental  tests. 

The  use  of  the  largest  eigenvalue  to  speed  convergence  helps  consider- 
ably when  there  is  a large  spread  in  eigenvalues.  If  a more 
sophisticated  search  procedure  were  devised  where  each  eigenvalue  was 
used  in  sequence  after  a percentage  change  in  the  Frobenlus  norm,  then 
this  phenomenon  of  the  matrix  filter  could  be  alleviated  even  more  and 
possibly  eliminated  altogether. 

Using  a more  sophisticated  search  procedure  similar  to  those 
employed  in  deterministic  systems  would  decrease  the  time  to  reach  the 
optimum.  Kushner  (74,75)  has  tried  to  formulate  search  strategies  for 
stochastic  type  algorithms  but  these  techniques  have  been  proven  for  a 
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If  the  filter  were  started  at  some  other  point,  then  depending 
on  where  it  was  started,  the  S/N  could  either  increase  or  decrease.  If 
one  was  far  from  the  optimum,  then  the  S/N  would  increase,  but  in  this 
simulation  all  filters  are  initialized  to  the  same  identity  matrix;  i.e. 
optimum  filter  for  no  noise,  and  thus,  in  the  case  of  the  beam  with  no 
noise,  its  S/N  ratio  goes  down  for  a time  and  then  starts  to  increase 
after  a number  of  iterations  of  the  algorithm.  Since  we  assume  no  a 
priori  knowledge  as  to  where  to  start  the  recursive  algorithm,  this 
initialization  is  as  good  as  any.  Of  course,  any  a priori  knowledge 
could  be  used  to  alleviate  this  problem  and  increase  the  convergence 
rate. 

6.4  Generic  Case  Results 

It  has  been  assumed  in  the  simulations  that  all  signals  are 
uncorrelated  i.e., 

E{sis2>  = Eis^}  = Ets^}  = 0 (6.16) 

and  that  all  self  noises  are  uncorrelated  both  with  the  signals  s^, 
s2  and  and  with  other  self  noises 


The  position  of  a signal  or  noise  in  any  beam  is  adjusted  by 
varying  the  coupling  coefficients  (Table  I).  The  power  of  the  signals 
or  noises  is  adjusted  by  changing  the  variances  of  the  signals.  The 
power  contained  in  each  case  tested  appears  in  Table  II.  The  coupling 
coefficients  give  the  amount  of  signal  which  is  contained  in  another 
beam  as  noise.  The  actual  form  of  the  individual  cases  can  be  gleaned 
from  a perusal  of  Table  II  and  Equations  (6.1),  (6.2)  and  (6.3).  A 
graphic  picture  of  the  significance  of  these  coefficients  can  be  seen 
in  Figure  4.  Unless  otherwise  indicated,  only  the  curves  of  S/N  for 
beams  with  interfering  signals  are  included. 

Figure  15  shows  that  the  S/N  ratio  goes  up  in  direct  relation 
to  the  way  the  mean  square  error  goes  down.  From  Figure  15,  we  can 
see  that  the  mean  square  error  decreases  initially  and  the  S/N  goes  up 
in  the  same  number  of  iterations.  This  is  not  a surprising  result  and 
it  is  not  only  intuitively  pleasing  but  also  verified  by  experimental 
tests.  The  fact  that  the  mean  square  error  and  the  signal-to-noise 
ratio  are  related  in  an  inverse  manner  is  not  a new  result.  Riegler 
and  Compton  (100)  also  verified  in  their  experiments  that  the  MSE  goes 
up  in  the  same  way  that  the  S/N  goes  down.  For  the  remainder  of  the 
tests,  the  signal-to-noise  ratio  is  used  as  the  criterion  of  system 
performance.  Mean  square  error  is  used  only  to  illustrate  certain 
results. 

The  results  of  the  tests  for  the  first  generic  case  can  be  seen 
in  Figure  16.  Figure  16  contains  the  test  results  when  there  were  no 
constraints.  Signal  s^  had  100  times  the  power  of  the  other  two 
signals  and  it  was  coupled  as  an  interfering  signal  in  both  of  the  other 
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beams.  For  this  case,  there  were  two  Interfering  signals,  s^  and  , 

In  Beam  1 and  two  interfering  signals,  s^  and  s^  In  Beam  3.  Even 
with  these  large  Interfering  signals,  the  interference  effect  on  the 
desired  signals  was  removed  in  less  than  20  iterations  of  the  algorithm. 
The  output  S/N  ratio  obtained  greater  than  50%  of  its  theoretical 
maximum  value  in  less  than  10  iterations.  The  percentage  is  calculated 
by  taking  the  ratio  of  the  increase  in  S/N  to  the  maximum  attainable  S/N. 
The  output  S/N  started  at  -12.5  and  increased  to  +8  in  approximately  20 
iterations. 

It  can  be  seen  from  Figure  17  that  the  constraint  mode  operated 

in  exactly  the  same  manner  as  the  no  constraint  mode.  It  can  be  stated 

that  the  only  difference  discovered  in  the  operation  of  the  stochastic 
algorithm  between  the  two  cases  (constraint  and  no  constraint)  was  in 
beams  which  had  no  noise.  This  result  was  also  discussed  in  Section  6.2. 

Figure  18  shows  the  case  where  one  signal  s^  , has  one  hundred 
times  the  power  of  the  other  two.  This  large  signal  was  injected  into 
the  other  two  beams  by  the  coupling  coefficients  which  appear  in  Table 

II.  Another  signal,  s^  , was  also  injected  into  Beam  1.  Thus,  Beam  1 

had  two  interfering  signals  and  Beam  3 had  one  interfering  signal.  The 
interfering  signal  in  each  beam  was  decreased  in  the  output  S/N  by  over 
ZZT.  •■fThe  output  S/N  started  at  -12.8  and  increased  to  +10  dB  in  less 
than  20  iterations.  In  Beam  3,  the  output  S/N  started  at  -12.4  dB  and 
increased  to  +10  dB  in  less  than  20  Iterations. 

Another  test  was  conducted  for  generic  case  2 and  the  results 
appear  in  Figure  19.  In  this  test,  the  power  of  s2  was  still  100 
times  the  power  of  s^  but  the  power  of  s^  was  increased  to  10  times 
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the  power  of  s^  . Both  and  s^  were  Injected  into  as  noises 

The  signal  s^  was  completely  masked  and  Beam  1 had  an  extremely  poor 
initial  S/N  of  -18  dB.  This  case  also  had  the  added  hindrance  in  that 
the  interfering  signal  s^  was  put  near  the  crossover  point  of  the 
beams.  This  should  be  the  most  difficult  position  to  remove  an  inter- 
fering noise.  In  spite  of  these  worst  case  conditions,  the  algorithm 
increased  the  output  S/N  to  +8  dB  in  less  than  20  iterations.  This  was 
an  increase  of  20  dB  on  a theoretical  perfect  solution  of  38  dB.  Surely 
an  impressive  result. 

The  results  for  generic  case  3 appear  in  Figure  20.  This  case 
was  constructed  with  s^  having  100  times  the  power  in  s^  and  s^ 
had  10  times  the  power  in  s^  . Signal  s^  was  coupled  into  both  Beams 
1 and  3 and  there  was  no  other  coupling.  As  can  be  seen  from  Figure  20, 
both  Beam  1 and  Beam  3 attained  greater  than  50%  of  their  theoretical 
maximum  S/N  in  approximately  25  iterations.  Beam  1 started  at  approxi- 
mately -12  dB  and  Increased  +10  in  approximately  25  iterations. 

6. 5 Stopping  Rule 

The  following  stopping  rule  is  being  proposed  to  determine  when 
the  optimum  la  attained  with  a certain  accuracy  in  the  probabilistic 
sense.  It  is  being  proposed  only  as  an  indication  of  needed  future 
results  for  long  term  properties  of  the  recursive  algorithm.  There  are 
special  stopping  rules  for  regular  Iterative  methods  which  are  based  on 
the  comparison  of  the  last  two  Iterations  H(n  - 1)  and  H(n)  of  the 
iterative  algorithm.  The  stopping  rule  being  proposed  for  the 
probabilistic  algorithm  is  to  compute  a running  average  of  the  form 
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Figure  20.  Generic  Case  Three 
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RAV*,J(f)  “ N 1 Hk  (f)  * (6.21) 

k-n 

where  k is  the  Iteration  number  of  the  recursive  algorithm  and  (l,j) 
represents  the  position  of  the  filter  coefficient  in  the  matrix  filter, 
and  then  compute  the  Frobenius  norm  of  the  difference  of  any  two  itera- 
tions of 
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where  £ is  a small  real  number. 

If  the  Frobenius  norm  of  the  difference  is  smaller  than  a certain 
number,  e , the  algorithm  can  be  stopped  and  one  can  be  assured  of  the 
required  accuracy  in  the  probabilistic  sense. 

It  is,  however,  difficult  to  choose,  N , the  number  of  iterations 
of  the  adaptive  algorithm  to  average  over.  This  quantity  is  dependent 
upon  the  statistics  of  the  underlying  process  and  is  difficult  to 
determine  a priori.  Of  course,  this  stopping  rule  does  not  apply  to  the 
on  line  system  because  it  requires  a great  amount  of  computation.  This 
stopping  rule  has  not  been  used  in  this  analysis  because  the  long  term 
characteristics  of  the  stochastic  adaptive  filter  were  not  desired. 
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CHAPTER  VII 

CONCLUSIONS  AND  EXTENSIONS 

7.0  Problem  Statement 

The  purpose  of  this  research  is  a derivation  of  a complex 
stochastic  processor  as  the  adaptive  implementation  of  the  multi-input, 
multi-output  Wiener  filter.  The  stochastic  adaptive  processor  operates 
in  an  environment  where  the  noise  statistics  were  not  known  a priori. 

It  operates  under  the  weak  signal,  strong  interference  assumption.  The 
adaptive  matrix  filter  was  to  be  able  to  remove  the  strong  interference 
and  Increase  the  output  S/N  in  a few  iterations  of  the  recursive  algo- 
rithm (derived  from  stochastic  approximation  considerations)  used  to 
derive  the  adaptive  matrix  filter.  The  stochastic  adaptive  processor 
requires  only  complex  multiplications  and  additions  (Section  6.2).  On 
a digital  computer  the  adaptive  processor  can  be  implemented  simply 
through  the  use  of  arithmetic  units.  With  present  computing  speeds,  the 
recursive  algorithm  has  as  its  goal  the  ability  to  be  implemented  in 
real  time  and  to  obtain  the  adaptive  matrix  filter  on-line. 

7.1  Results  and  Future  Work 

The  stochastic  adaptive  processor's  ability  to  perform  the  above 
functions  has  been  demonstrated  by  the  computer  simulation  results. 

The  adaptive  processor  was  designed  under  the  weak  signal,  strong 


Interference  assumption  and,  in  all  cases  tested,  this  objective  was 
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attained.  One  can,  as  a consequence  of  the  results  obtained,  expect  a 
significant  improvement  in  signal-to-noise  ratio  at  the  output  of  the 
adaptive  processor. 

The  stochastic  processor  removed  the  strong  interfering  signals 
in  all  cases  tested  in  a small  enough  number  of  iterations  to  make  the 
algorithm  an  on-line  performer.  The  adaptive  processor  demonstrated  its 
ability  to  rapidly  increase  the  output  S/N  and  make  it  a desirable 
input  to  a detection  system.  Since  the  stochastic  adaptive  algorithm 
takes  into  account  the  fact  that  its  inputs  are  random  processes,  it  is 
better  able  to  handle  the  processing  requirements  than  any  deterministic 
algorithm.  Its  greatest  assets  would  seem  to  be  the  ability  of  fast 
increase  in  S/N  and  the  built  in  ability  to  handle  stochastic  processes. 
One  important  extension  of  the  work  in  this  research  would  be  the  use  of 
stochastic  approximation  when  the  input  random  processes  are  non-station- 
ary. 

The  extension  of  adaptive  array  processors  to  the  multi-input, 
multi-output  case  gives  impetus  for  new  types  of  processing  systems 
(including  the  likelihood-ratio  detector  part  of  the  receiver)  beyond 
those  possible  with  single  output  array  processors. 

The  extension  of  the  adaptive  array  processors  to  complex 
signals  offers  new  possibilities  not  available  in  theory  based  on  real 
signals.  Further  theoretical  development  is  facilitated  by  analytic 
representation  of  band  pass  signals  which  naturally  leads  to  complex 
signals . 

The  most  attractive  method  for  obtaining  the  optimum  solution 
would  seem,  at  first  glance,  to  be  matrix  inversion.  An  iterative 
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scheme  for  matrix  inversion  would  seem  a most  promising  method.  But 
even  when  R ^(t)  or  G ^(f)  Is  obtained,  there  still  must  be  a 

XX  XX 

complex  matrix  multiplication  to  determine  the  optimum  filter 
* -1 

[h_„_(f)  » G (f)  k .(f)]  • Since  the  adaptive  algorithm  calculates 

the  optimum  filter  directly  and  does  not  need  a succeeding  matrix  multi- 
plication, a savings  accrues  to  the  adaptive  filter.  A more  important 
consideration  is  how  many  iterations  and  how  much  arithmetic  is  involved 

in  the  iterative  matrix  inverse.  Since  R ^(t)  is  not  known  exactly, 

xx 

the  inverse  must  be  continually  updated.  It  is  not  clear  that  any  type 
of  matrix  inverse  that  must  be  updated  has  any  advantage  over  the  sto- 
chastic adaptive  algorithm  described  here.  In  fact,  the  simplicity  of 
calculation  and  the  ease  of  updating  seems  to  indicate  that,  at  present, 
the  stochastic  adaptive  algorithm  is  superior.  It  is  not  proven,  at 
this  time,  that  any  iterative  matrix  inverse  has  a smaller  computational 
load  than  the  adaptive  filter. 

With  only  a few  exceptions,  all  previous  adaptive  processors  have 
been  based  on  real  signals  and  finite  impulse  response  filters  with  real 
weights.  The  present  adaptive  matrix  filter  operates  completely  on 
complex  input  signals  In  many  applications  this  approach  leads  to  a 
simpler  overall  system. 

The  first  objective,  to  derive  the  optimum  filter,  was  obtained 
through  the  use  of  the  orthogonal  projection  lemma.  This  general  purpose 
technique  allows  derivation  of  the  minimum  mean  square  error  when  other 
more  standard  minimization  procedures  fail.  The  idea  of  a projection 
operator  is  widely  used  in  the  abstract  theory  of  vector  spaces  (Halmos, 
53)  and  topology  (Kolomogorov  and  Fomin,  71).  The  idea  of  a projection 
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operator  is  very  general  and  has  found  and  will  continue  to  find  diverse 
applications  in  the  engineering  literature.  Its  potential  applications 
to  engineering  problems  have  not  yet  been  exploited  in  any  depth. 
Application  of  projection  operators  have  crept  into  the  engineering 
literature  from  many  diverse  problem  areas.  Kalman  (65)  used  the  idea 
to  derive  the  minimum  variance  unbiased  estimator  which  is  used  so  often 
in  both  theoretical  control  system  work  and  practical  control  systems. 
Frost  (47)  uses  the  idea  of  a projection  operator  in  his  adaptive  pro- 
cessor design.  Rosen  (103)  has  incorporated  the  projection  operator 
into  a deterministic  search  or  minimization  procedure  which  is  coming 
into  widespread  use. 

The  special  purpose  variable  gain  sequence  y permits  the 

n 

stochastic  adaptive  processor  to  obtain  fast  increase  in  output  S/N, 
thereby  enabling  the  adaptive  processor  to  operate  in  a real  time 
environment.  Using  the  largest  possible  stable  gain  sequence,  the 
adaptive  algorithm  obtains  fast  initial  convergence  but  retains  the 
advantages  of  statistical  smoothing  inherent  in  the  stochastic  approxi- 
mation technique.  This  fast  increase  in  output  S/N  is  important  in  any 
practical  application  of  the  adaptive  processor  to  communication 
systems  or  control  systems.  The  long  term  statistical  considerations 
are  important  if  one  needs  an  accurate  answer  as  to  how  close  to  the 
optimum  solution  one  can  get  with  either  a naive  or  sophisticated 
search  strategy  It  was  seen  that  drastic  changes  in  both  the  conver- 
gence rate  and  the  statistical  smoothness  of  the  matrix  filter 
coefficients  can  be  affected  by  small  changes  in  either  the  gain 
constant,  or  the  variable  gain  sequence.  The  variable  gain  sequence 
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variation  affected  primarily  the  smoothness  of  the  coefficients  and  the 
gain  constant  affected  primarily  the  initial  convergence  rate.  The 
variable  gain  sequence  also  has  a marked  effect  on  initial  convergence 
rate  but  its  effect  is  not  as  adjustable  as  the  effect  of  the  gain 
constant.  This  fact  makes  variation  of  the  exponent  of  the  gain  sequence 
less  valuable  for  a real  time  application.  Better  mathematical 
strategies  for  variation  of  both  the  gain  constant  and  the  gain  sequence 
can  result  in  both  better  initial  speed  and  long  term  smoothness.  This 
area  of  research  could  produce  the  most  fruitful  results  in  the 
application  of  stochastic  approximation  techniques  to  engineering 
problems.  The  stochastic  adaptive  processor  successfully  combines 
both  stochastic  principles  and  complex  variables  in  an  adaptive  real 
time  processing  environment.  The  ability  to  use  the  stochastic  nature 
of  the  input  signals  without  slowing  down  its  performance  is  a major 
attribute  of  this  adaptive  matrix  filter. 

The  following  method  is  proposed  as  a means  of  accelerating 
convergence  of  the  stochastic  recursive  algorithm.  The  derivation  of 
the  optimum  gain  constant  from  considerations  of  the  idealized  form  of 
the  recursive  algorithm  shows  that  the  knowledge  and  use  of  the  eigen- 
values of  the  power  spectral  density  matrix  G^^f)  gives  a large 
increase  in  the  convergence  rate.  The  mathematical  operation  of  the 
recursive  algorithm  used  to  derive  the  matrix  filter  coefficients  is 
not  a scalar  process  but  a vector  process.  Since  each  dimension  of  the 
vector  space  of  the  recursive  algorithm  has  a different  convergence 
rate,  it  is  necessary  to  take  this  fact  into  account  to  be  able  to 
approach  the  fastest  convergence  rate.  If  one  uses  a matrix  gain  of 
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the  form  (for  a three-beam  system). 
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instead  of  a scalar  gain 
2 

8 " x — or 

max 

then  an  overall  increase  in  the  convergence  rate  can  be  expected.  If 
we  consider  each  dimension  of  the  recursive  algorithm  separately,  then 
there  are  two  possibilities  for  the  search  convergence.  Either  the  gain 
in  that  dimension  is  too  small  and  the  convergence  is  slow  or  the  gain 
is  too  large  and  there  is  overshoot  of  the  minimum.  These  effects  have 
been  illustrated  by  the  experimental  results  described  earlier.  In  the 
no-noise  case,  the  recursive  filter  was  at  the  optimum  but  the  gain  was 
large  and  the  first  move  was  far  from  the  optimum.  The  gain  constant 
then  decreased  in  size  and  the  convergence  was  slow  but  in  the  correct 
direction.  Using  the  knowledge  of  the  eigenvalues  in  the  matrix  gain, 
it  is  possible  to  both  increase  the  convergence  rate  in  the  noisy 
signals  and  decrease  the  initial  movement  from  the  optimum  in  the  non- 
noisy  signals.  While  this  technique  requires  more  arithmetic  than  the 
scalar  gain,  an  increase  in  convergence  rate  would  justify  its  use. 
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The  movement  in  any  dimension  is  governed  by  how  far  or  how  close 
one  is  to  the  optimum  and  the  size  of  the  gain.  Using  the  knowledge  of 
the  results  in  this  research,  it  is  possible  to  adjust  each  dimension 
of  the  recursive  algorithm  separately.  The  most  desirable  situation  is 
to  be  able  to  separately  adjust  the  gain  in  each  dimension  depending  on 
its  position  relative  to  the  optimum  and  not  on  the  general  consideration 
01  all  the  dimensions.  With  the  results  derived  here,  increased  con- 
vergence can  be  expected  and  it  is  possible  to  develop  a search  strategy 
closer  to  the  optimum  one  than  is  possible  with  any  scalar  gain.  Kushner 
(75)  has  attempted  to  find  the  optimum  step  size  in  each  dimension  by  a 
minimization  in  each  dimension.  The  technique  proposed  here  would  take 
fewer  calculations  and  it  is  computationally  simpler  but  it  is,  of 
course,  not  the  optimum  movement  at  each  step  of  the  recursive  algorithm. 

The  implementation  of  the  adaptive  algorithm  is  extremely  simple. 
It  involves  only  complex  multiplications  and  additions.  The  time  con- 
siderations in  Section  6.2  show  how  fast  the  algorithm  can  operate  with 
available  technology.  It  has  also  been  shown  that  it  is  computationally 
simple  to  obtain  the  parameters  used  in  the  recursive  algorithm. 

Contained  in  the  convergence  proofs  of  the  stochastic  recursive 
algorithm  used  to  derive  the  adaptive  filter  was  the  important  result 
that  the  filter  coefficients  resulting  from  the  successive  operations 
of  the  recursive  algorithm  were  martingales.  As  was  pointed  out,  this 
fact  is  not  only  valuable  in  proving  convergence,  but  is  also  useful  in 
stochastic  stability  considerations. 
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It  has  beer,  shown  that  stochastic  T.yapunov  functions  satisfy  the 
martingale  property.  This  result  has  i-any  possibilities  to  be  exploited 
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in  the  analysis  of  stochastic  control  systems.  It  is  possible  to 
develop  a stochastic  analog  of  deterministic  Lyapunov  functions.  This 
would  give  the  control  system  designer  a framework  in  which  to  analyze 
stochastic  control  system  which  are  becoming  more  prevalent  in  engineer- 
ing applications.  Without  this  framework  for  stochastic  stability,  it 
is  very  difficult  to  design  reliable  stochastic  control  systems  when  the 
order  of  the  system  or  the  number  of  state  variables  is  large. 

The  stochastic  adaptive  processor  has  application  to  a wide  range 
of  different  problems.  It  could  be  used  as  an  adaptive  filter  for  a 
digital  communication  system.  It  could  be  used  in  biological  research 
for  predicting  a level  of  response  in  an  experiment.  It  could  be  used 
by  statisticians  to  estimate  probability  distribution  functions.  Control 
system  engineers  could  use  this  technique  for  system  identification  and 
process  control.  Workers  in  the  field  of  pattern  recognition  and 
machine  learning  could  use  these  stochastic  approximation  principles  to 
great  advantage.  The  stochastic  technique  has  wide  applicability  in 
seismic  signal  processing.  In  the  area  of  array  processing  and  digital 
signal  processing  (communication  systems  and  seismic  signal  processing) , 
the  advent  of  the  fast  Fourier  transform  has  made  frequency  domain 
methods  more  attractive  for  real  time  use.  The  present  technique  of 
adaptive  processing  gives  a viable  means  for  handling  the  complex 
variables  mat  arise  naturally  from  frequency  domain  problems.  A much 
needed  extension  to  the  present  work  would  be  to  build  a mathematical 
framework  in  which  to  handle  a wider  range  of  complex  variable  minimiza- 
tion problems.  Not  all  of  the  theory  of  real  variables  can  be  used  in 
complex  variables.  A simple  extension  of  real  analysis  to  complex 
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analysis  is  not  possible  in  some  important  problems  and,  since  most 
frequency  domain  techniques  involve  complex  variables,  a wider  framework 
to  handle  these  problems  is  important. 

A unification  with  the  techniques  used  in  the  decoupling  theory 
in  multivariable  control  systems  would  be  valuable  because  it  would 
give  some  insight  into  stochastic  approximation  theory  from  the  point 
of  view  of  stability  and  observability  in  modern  control  theory.  Since 
stochastic  approximation  and  stability  theory  are  related  through  the 
martingale  property,  it  might  be  possible  to  simplify  the  convergence 
proofs  and  provide  a tehcnique  for  a more  optimum  search  strategy. 

The  present  stochastic  adaptive  filter  is  a valuable  addition  to 
any  detection  or  parameter  estimation  system.  From  the  receiver 
operation  curves  (ROC) , it  can  be  gleaned  that  the  higher  the  input  S/N 
is  to  any  detection  system,  the  greater  the  probability  of  detection  at 
a given  false  alarm  rate.  Since  the  present  adaptive  system  increases 
the  S/N  to  over  half  of  its  theoretical  maximum  in  a few  iterations,  it 
can  be  used  as  an  effective  input  to  any  optimum  or  suboptimum  detector. 

While  the  present  processing  system  has  not  been  considered  in 
the  framework  of  a complete  detection  system  but  only  as  an  input  to 
the  detection  part  of  any  receiver,  it  is  instructive  to  state  the 
following  results  about  the  space,  time,  and  detection  factorability  of 
optimum  processors.  These  results  become  important  when  a general  form 
for  the  various  optimum  array  processing  schemes  is  desired. 

Middleton  and  Groginsky  (88)  were  the  first  to  consider  the 
space-time  factorability  of  optimum  processors.  They  showed  that 
factorization  is  not,  in  general,  possible  in  optimum  passive  array 
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The  space-time  structure  of  optimum  active  array  processors  and, 
in  particular,  the  factorability  of  such  processors  into  spatial  and 
temporal  operations  was  studied  by  Pasupathy  and  Venetsanopoulos  (95). 
They  studied  the  problem  for  linear  continuous  array  in  a reverberation- 
limited  environment  and  derived  the  conditions  on  signal,  reverberation, 
and  array  parameters  under  which  such  a factorization  is  possible.  They 
found  that  the  factorability  was  limited,  in  general,  to  narrowband 
systems . 

Van  Trees  (125)  has  shown  that  for  a certain  class  of  optimal 
criteria  and  signal  models,  all  optimum  space-time  processing  schemes 
are  composed  of  a beamformer  or  spatial  processor  followed  by  a scalar 
filter  and,  for  plane  wave  signals,  the  spatial  processor  is  common  to 
all.  Only  the  filter  reflects  the  particular  criterion  of  optimality 
selected . 

The  processing  system  considered  here,  where  beamforming  is 
performed  before  any  adaptive  filtering  function,  does  not  fit  exactly 
into  the  preceding  categories  of  optimum  processors.  One  major 
difference  is  that  the  final  filtering  function  is  a matrix  filter  rather 
than  a scalar  one.  While  the  optimality  criterion  are  in  some  cases  the 
same,  it  is  in  the  particular  implementation  of  the  spatial  and  timporal 
parts  of  the  processor  that  the  various  schemes  differ. 
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APPENDIX  A 
S/N  CALCULATION 

Since  the  output  of  the  filter  has  been  defined  as 

i = HTx  , (A. 1) 

one  can,  because  of  the  way  the  beams  are  constructed,  calculate  an 
estimate  of  signal-to-noise  ratio  at  the  output  of  the  adaptive  filter. 
Knowing  the  input  signal  statistics  (zero  mean,  variable  variance  normal 
processes),  one  can  calculate  an  approximate  output  si^nal-to-noise 
ratio. 

; k 

If  we  assume  the  mean  of  all  s are  zero  and  that  E{s  s } = 

n n n 

2 

0^  , then  we  can  calculate  the  products 

yn(yn)*  - [(hn)T  xn][(hn)T  Xn]*  (A. 2) 

and  collect  the  terms  due  to  the  signals  alone  and  noise  alone,  and  the 
ratio  of  these  terms  is  the  output  signal-to-noise  ratio.  The  beams 
(x°)  are  constructed  in  the  following  manner: 

(A.  3) 
(A. 4) 


x1  = (s 1 ) s ^ + (a)s2  + (b)s j + (cc) sn^ 
x2  « (f)s1  + (s2)s2  + (g)s3  + (dd)sn2 
x^  » (h^s^  + (p)82  + -s^^s3  + (ee)sn3 


(A. 5) 
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Using  Equations  (A. 3),  (A. 4)  and  (A. 5),  the  products  indicated 
in  (A. 2)  can  be  calculated  and  the  separation  of  signal  terms  from 
noise  terms  in  the  resulting  expressions  permits  calculation  of  an  out- 
put signal-to-noise  ratio. 


1 

y 


1*1111*  * * * * * * 

(y  ) = h (h  ) [(si  • si  )s^s^  + (a  • a )s2s2  + (b  • b )s3s3  + 

* * 11  12  * * * 

(cc  • cc  )sn1sn^  ] + h (h  ) [(si  • f ^^s^  + 

(a  • s2  )s2s2*  + (b  ' g*)s3s3  ] + h31(h33)  [(si  • h Js^  + 

(a  • p ) s 2s ^ + (b  • s3  )s3s3  ] + h32(h33)  [ (f  • si  >8^  + 

(s2  + a )s2s2  + (g  ' b ) S3S j ] + h12(h12)  [ (f  • f )s1s1  + 

* * ^ ^ £ £ 

(s2  • s2  )s2s0  + (g  • g )S3S3  + (dd  * dd  )sn2sn2  ] + 

h12(h13)  [ (f  • h )s1s1  + (s2  • p ) ® 2 s 2 + (8  * s3  )S3S3  1 + 

h33  (h33)  [ (h  • si  )s1s1  + (p  • a*)s2s2  + (s3  • b )s3s3  ] + 

h33(h32)  [ (h  • f Js^Sj.  + (p  • s2  )s2s2  + (s3  • g )s3s3  ] + 

h13(h33)  [ (h  • h )s1si  + (p  * p )s2s2  + (s3  ’ s3  ^S3S3  + 

•k  k 

(ee  • ee  )sn3sn3  ] (A. 6) 


2 

y 


2 * 21  21  *.  * * * * * * 

(y  ) * h (h  ) [(si  • si  >8^8^  + (a  • a )s2s2  + (b  • b )s3s3  + 

* * 21  22  * * * 

(cc  • cc  )snjSn3  ] + h (h  ) [(si  • f )s3s^  + 

(a  • 82*)s2s2*  + (b  • g*)s3s3  ] + h23(h23)*[  (si  • lOs^  + 
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* * * * 22  21  * * * 

(a  • p )s2s2  + (b  • s3  ] + h (h  ) [(f  si  + 

(s2  • a )s2s2  + (g  • b )s3s3  ] + h22(h22)  [ (f  • f >8.^  + 

* * * * "k  k 

(s2  • s2  )^2s2  + (8  • g )S3S3  + (dd  ' dd  )sn2sn2  ] + 

h22(h23)  [(f  • h )slSl*  + (s2  • p*)s2s2  + (g  • s3  )s3s3  ] -t- 

h23(h21)  [ (h  • si  )s1s1  + (p  • a )s2s2  + (s3  • b )s3s3  ] + 

OQ  OO  k -Jr  k k k k k 

h (h  ) [(h  • f >8.^  + (p  • s2  )s2s2  + (s3  • g )s3s3  ] + 

h23(h23)  [ (h  • h + (p  • p )s2s2  + (s3  • s3  )s3s3  + 

* * 

(ee  • ee  )sn3sn^  ] (A. 7) 

y3  • (y3)  = h31(h31)*[ (si  • sl*)s1s1*  + (a  • a*)s2s2  + (b  • b )s3s3  + 

(cc  • cc*)sn1sn1*]  + h31  (h32) * [ (si  • F^s^*  + 

(a  • s2  )s2s2  + (b  • g ) s 3s 3 ] + h31(h33)  [ (si  • h >8^  + 

* * **  32  31  * * * 

(a  • p )s2s2  + (b  • s3  ) s jS 3 ] + h (h  ) [(f  si  >8^  + 

(s2  + a )s2s2  + (g  • b )s3s3*]  + h32(h32)  [(f  • f >8^  + 

k k k k k * 

(s2  • s2  )s2s2  + (g  . g ) s 3s 3 + (dd  • dd  )sn2sn2  ] + 

h33(h31)  [ (h  • si  + (p  • a )s2s2  + (s3  • b )s3s3  ] + 

h33(h32)  [(h  • f + (p  • s2  )s2s2  + (s3  g )s3s3  ] + 

h33(h33)  [(h  • h >8^  + (p  • p ) ® 2® 2 + (s3  • s3  )&3s3  + 


d 


I 


I1 
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* * 32  33  * * * 

(ee  • ee  )sn^sn^  ] + h (h  ) [ (f  • h )s^s^  + 

(s2  • p )s2s2  + (g  ' s3  )s3s3  ] . (A. 8) 


If  all  the  signal  terms  are  separated  from  the  noise  terms  in 
any  given  beam  output,  then  the  output  signal-to-noise  ratio  can  be 
calculated  as  the  ratio  of  these  terms.  In  simple  terms,  the  output 
signal-to-noise  ratio  can  be  defined  as: 


S/N 


SNR 


HT(f)  G (f ) H(f )* 
ss 

HT(f)  G (f ) H(f ) * 
nn 


(A. 9) 


jL 


m 


APPENDIX  B 


LINEAR  SYSTEM  RELATIONSHIPS 


The  single  output  of  a multi-input,  multi-output  system  is 
written  as 


yi(f)  - [^(f)]1  x(f) 


and  in  matrix  form. 


Z(f)  = HT(f)  x(f)  , 


where 


i(  f) 


y1(f) 

xL(f) 

y2(f) 

x2(f) 

• 

, x(f)  " 

• 

_yN(f)_ 

_xN(f)_ 

and 

H(f ) - fh^f)  h2(f)  . . hN(f)]  , 


(B.l) 


(B.2) 


A 


i 
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H(f  ) 


hU(f) 

h21(f) 


,12.  lN.f, 

h (f)  . . . h (f) 


hN1(f), 


hra(f) 


The  output  power  spectral  density  matrix  Is  given  by: 


G (£)  = HT(f)Gxx(f)H*(f)  = [HT(f)x(f)][HT(f)x(f)]H  . CB.3) 

Some  useful  relations  for  the  auto-spectrum  and  cross  spectrum 
matrices  are: 


<W1(f) 

• '°xx'1(£),H 

(B  4) 

G H(f) 
xz 

■ yf) 

(B.  5) 

G H(f) 

XX 

* G (f) 

XX 

(Hermitian  property) 

(B  6) 

f 


A 


( 
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APPENDIX  C 

PROBABILITY  THEORY  DEFINITIONS 
(C.l)  Conditional  Expectations 

The  conditional  distribution  function  F i (x|y)  of  x given 

x | y 

Y = y is  defined  by 

Fx|y(-i>r)  - ^pRoaw' ^ for  PE0B(Y  -y>>0  • (c-1) 

For  any  values  x and  y , we  can  define  (Karlen  and  Taylor , 66) 

PROB{X  < x,  Y < y}  = /Z  F - (x|z)  dF  (z)  . (C  2) 

-oo  x i y y 

To  define  the  law  of  total  probability,  we  make  y = 00  in  Equation  (C.2) 
and  get 

PR0B{X  < x}  = PR0B(X  < x,  Y < °°}  = _oo/+3°Fx|y(xly)  dFy^^(C  ' 

For  any  bounded  functions  g(x)  and  h(y)  , Equation  (C.3)  becomes 

E(g(x)  h(y) } = E(E(g(x)|y]  h(y)}  . (C  A) 

The  following  equations  list  some  conditional  expectation 


c 


Identities. 
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I 


4 


E{h(x,y)  | 

Y - y}  = 

E(h(x,y)  | Y = y} 

(C.  5) 

E{g(x)  f (y) 

1 y>  = 

f(y)  E{g(x)  I y} 

(C . 6) 

E{c|y}  = c for 

c a constant 

(C.  7) 

|e{xT£}|2  < e{||x| 

|2>  E{||y| |2}  . 

(C , 8 ) 

(C.2)  Martingales 

Definition  C.l.  A martingale  is  defined  as  a real  or  complex 
stochastic  process  (x^,  neN)  for  which  E{ | | } < 00  , neN  and 


E{x 


n+1 


x 

n 


(C . 9) 


with  probability  1.  If  the  equal  sign  in  Equation  (C.9)  is  replaced  by 
£ , then  the  process  is  a supermartingale. 

If  an  x process  is  a martingale,  the  process  defined  by  the 
n 

real  and  imaginary  parts  of  the  xn's  are  also  martingales. 

Definition  C.2.  Let  (x  , y , n = 1,  2,  . . . } by  a martingale 
n n 

on  a probability  space  (ft,  0,  P)  with  sup^Etlx^l)  < 00  . The 
following  conditions  are  all  equivalent: 


{x  , y , n = 1,  2,  . . . , °°}  is  a martingale 
n n 


lim  E{  x - x } = 0 

' n °°' 

n-*x> 


lim  E{|x  | } 

n-+oo 


(C.10) 

(C.ll) 


l 


(C. 12) 
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